07. DevOps about Monitor
DevOps about Monitor
Prerequisites
1. What is Monitor in DevOps?
The Monitor phase is where you continuously observe the system to understand its behavior in production.
Monitoring answers a critical question:
“Is the system actually running as expected right now?”
Monitor is the phase where system metrics, logs, and behavior are continuously observed to detect issues, ensure performance, and maintain reliability.
2. Why Monitoring Matters
Even a perfectly built and deployed system can fail in production.
❌ Without monitoring
- issues go unnoticed
- performance degradation is invisible
- crashes are detected too late
- debugging becomes difficult
- users experience problems first
✔ With monitoring
- issues are detected early
- performance is visible
- alerts can trigger automatically
- root cause analysis becomes easier
3. Goals of the Monitor Phase
Monitoring ensures:
- the system is alive (health)
- performance meets expectations
- resources are not exhausted
- errors are visible
- abnormal behavior is detected early
4. What to Monitor
4-1. Latency (Critical)
Time taken per frame
1
Target: < 10 ms per frame
If latency increases → performance drift
4-2. Throughput (FPS)
How many frames processed per second
1
Target: > 100 FPS
If FPS drops → system overloaded
4-3. CPU Usage
1
High CPU → bottleneck or overload
4-4. Memory Usage
1
Memory increasing → memory leak
4-5. Queue Size
1
Queue growing → processing slower than input
Early sign of failure
4-6. Error Logs
1
2
Frame processing failed
Invalid input
Detect correctness issues
5. Types of Monitoring Data
✔ Metrics
Numerical data
- latency
- CPU
- memory
- FPS
✔ Logs
Text-based records
1
2
[INFO] Frame processed
[ERROR] Failed to decode image
✔ Traces (advanced)
Track request flow through system
6. Monitoring Tools
Common tools:
- Prometheus → metrics collection
- Grafana → dashboards
- Nagios → alerts
7. Monitoring Flow in DevOps
1
2
3
4
5
6
7
System running
↓
Metrics collected
↓
Data visualized
↓
Alerts triggered if abnormal
Alerts (Very Important)
Monitoring is useless without alerts.
Example conditions
1
2
3
Latency > 20ms
Queue size > threshold
Memory continuously increasing
Trigger:
- Slack
- pager
Example Monitoring Scenario
Normal
1
2
3
4
Latency: 5ms
FPS: 100
Memory: stable
Queue: small
Problem (early warning)
1
2
Latency: 8ms → 12ms → 20ms
Queue: increasing
Action needed before crash
8. Common Issues Detected by Monitoring
❌ Performance drift
Gradual slowdown over time
❌ Memory leak
Memory usage continuously increases
❌ Queue overflow
Backlog grows → latency explosion
❌ CPU saturation
System cannot keep up
❌ Silent failure
System runs but produces wrong results