Post

07. DevOps about Monitor

07. DevOps about Monitor

DevOps about Monitor


Prerequisites


1. What is Monitor in DevOps?

The Monitor phase is where you continuously observe the system to understand its behavior in production.

Monitoring answers a critical question:

“Is the system actually running as expected right now?”

Monitor is the phase where system metrics, logs, and behavior are continuously observed to detect issues, ensure performance, and maintain reliability.

2. Why Monitoring Matters

Even a perfectly built and deployed system can fail in production.

❌ Without monitoring
  • issues go unnoticed
  • performance degradation is invisible
  • crashes are detected too late
  • debugging becomes difficult
  • users experience problems first
✔ With monitoring
  • issues are detected early
  • performance is visible
  • alerts can trigger automatically
  • root cause analysis becomes easier

3. Goals of the Monitor Phase

Monitoring ensures:

  1. the system is alive (health)
  2. performance meets expectations
  3. resources are not exhausted
  4. errors are visible
  5. abnormal behavior is detected early

4. What to Monitor

4-1. Latency (Critical)

Time taken per frame

1
Target: < 10 ms per frame

If latency increases → performance drift

4-2. Throughput (FPS)

How many frames processed per second

1
Target: > 100 FPS

If FPS drops → system overloaded

4-3. CPU Usage

1
High CPU → bottleneck or overload

4-4. Memory Usage

1
Memory increasing → memory leak

4-5. Queue Size

1
Queue growing → processing slower than input

Early sign of failure

4-6. Error Logs

1
2
Frame processing failed
Invalid input

Detect correctness issues

5. Types of Monitoring Data

✔ Metrics

Numerical data

  • latency
  • CPU
  • memory
  • FPS

✔ Logs

Text-based records

1
2
[INFO] Frame processed
[ERROR] Failed to decode image

✔ Traces (advanced)

Track request flow through system

6. Monitoring Tools

Common tools:

  • Prometheus → metrics collection
  • Grafana → dashboards
  • Nagios → alerts

7. Monitoring Flow in DevOps

1
2
3
4
5
6
7
System running
   ↓
Metrics collected
   ↓
Data visualized
   ↓
Alerts triggered if abnormal

Alerts (Very Important)

Monitoring is useless without alerts.

Example conditions
1
2
3
Latency > 20ms
Queue size > threshold
Memory continuously increasing
Trigger:
  • email
  • Slack
  • pager

Example Monitoring Scenario

Normal
1
2
3
4
Latency: 5ms
FPS: 100
Memory: stable
Queue: small
Problem (early warning)
1
2
Latency: 8ms → 12ms → 20ms
Queue: increasing

Action needed before crash

8. Common Issues Detected by Monitoring

❌ Performance drift

Gradual slowdown over time

❌ Memory leak

Memory usage continuously increases

❌ Queue overflow

Backlog grows → latency explosion

❌ CPU saturation

System cannot keep up

❌ Silent failure

System runs but produces wrong results

This post is licensed under CC BY 4.0 by the author.