07. DevOps about Operate

Posted Apr 20, 2026 Updated Apr 21, 2026

2 min read

DevOps about Operate

Prerequisites

1. What is `Operate` in DevOps?

The Operate phase is where the deployed system is actually run, maintained, and kept stable in production.

Deployment is not the end. Operate is about what happens after the system is live.

Operate is the phase where a deployed system is continuously run, managed, and maintained to ensure stability, performance, and reliability.

2. Why Operate Matters

A system can:

build successfully
pass all tests
deploy correctly

and still fail in production

❌ Without proper operation

system crashes remain unnoticed
performance gradually degrades
memory leaks accumulate
queues overflow silently
users experience failures

✔ With proper operation

issues are detected early
performance remains stable
failures are handled quickly
system runs continuously (24/7)

3. Goals of the Operate Phase

The Operate phase ensures:

the system keeps running without interruption
performance remains within target limits
failures are detected and handled quickly
resources are used efficiently
the system can recover when issues occur

4. What Happens During Operation?

4-1. Process Management

The system must:

start correctly
restart automatically if it crashes
run continuously

Example

Process starts → runs loop → crash → auto-restart

Tools (examples):

systemd (Linux)
Docker restart policies

4-2. Runtime Monitoring (Basic Level)

Even before full monitoring systems, the application should:

log important events
report errors
expose basic metrics

Example logs:

[INFO] Frame processed in 5ms
[WARN] Queue size increasing
[ERROR] Frame dropped

4-3. Resource Management

The system must not exhaust resources.

Monitor:

CPU usage
memory usage
thread count
queue size

❌ Example problem

Queue size keeps increasing → memory grows → crash

4-4. Failure Handling

Failures will happen.

The system must:

handle errors gracefully
avoid crashing when possible
recover automatically

Example

Frame processing fails → skip frame → continue

Never stop the whole system for one bad frame

4-5. Continuous Operation (24/7)

Unlike test environments, production systems:

run indefinitely
must handle long-term stability

This connects directly to:

soak testing
memory leak prevention
resource cleanup

4-6. Key Concepts

✔ Idempotency (important)

Running the same operation multiple times should not break the system.

✔ Fault tolerance

The system should continue running even when parts fail.

✔ Backpressure

If input is faster than processing:

slow down input
drop frames
limit queue size

5. Real Example: Image Pipeline Operation

Camera
   ↓
Frame Queue
   ↓
Preprocessor (C++)
   ↓
Output

During operation, you must ensure:

queue does not overflow
processing stays within latency limits
system recovers from temporary failures
CPU usage remains stable

6. Common Problems in Operation

❌ Memory leaks

→ system crashes after hours or days

❌ Performance drift

→ latency increases over time

❌ Deadlocks

→ system freezes

❌ Resource exhaustion

→ no memory / threads left

❌ Silent failures

→ system runs but produces wrong output

7. Restart & Recovery Strategy

Operation must include recovery.

Example strategies

auto-restart process
watchdog monitoring
fallback modes
restart pipeline stage only

Crash → restart → resume processing

No manual intervention required

8. Automation in Operation

Manual operation is risky. Automation should handle:

process restart
log collection
health checks
scaling (if needed)

Example:

Container crashes → auto-restart

Deploy, Deploy - DevOps

Deploy Deploy - DevOps

This post is licensed under CC BY 4.0 by the author.

DevOps about Operate

Prerequisites

1. What is Operate in DevOps?

2. Why Operate Matters

❌ Without proper operation

✔ With proper operation

3. Goals of the Operate Phase

4. What Happens During Operation?

4-1. Process Management

Example

4-2. Runtime Monitoring (Basic Level)

4-3. Resource Management

❌ Example problem

4-4. Failure Handling

Example

4-5. Continuous Operation (24/7)

4-6. Key Concepts

✔ Idempotency (important)

✔ Fault tolerance

✔ Backpressure

5. Real Example: Image Pipeline Operation

During operation, you must ensure:

6. Common Problems in Operation

❌ Memory leaks

❌ Performance drift

❌ Deadlocks

❌ Resource exhaustion

❌ Silent failures

7. Restart & Recovery Strategy

Example strategies

8. Automation in Operation

Trending Tags

1. What is `Operate` in DevOps?