TL;DR — The USE method (Utilization, Saturation, Errors) gives you a three‑dimensional view of every resource. By instrumenting each dimension with the right metrics and coupling them to concrete alerts, you can pinpoint CPU, memory, disk, and network bottlenecks before they cascade into outages.
Production systems are a mosaic of CPUs, RAM, SSDs, network interfaces, and the software that stitches them together. When one piece stalls, the whole service can feel the pain. The USE method, originally coined by Brendan Gregg, provides a pragmatic checklist that maps directly onto the observability stack most enterprises already run—Prometheus, Grafana, OpenTelemetry, and the like. This post walks through each leg of the method, shows you how to instrument it with real‑world tooling, and ties the metrics into architectural patterns that survive at scale.
Understanding the USE Method
The acronym stands for Utilization, Saturation, and Errors. Each dimension answers a specific question:
| Dimension | What it tells you | Typical metric |
|---|---|---|
| Utilization | How much of the resource is in use? | cpu_usage_percent, mem_used_bytes |
| Saturation | How backed up is the resource? | run_queue_length, disk_io_wait |
| Errors | Are there faults that prevent normal operation? | cpu_hardware_errors, disk_read_errors |
When you evaluate a resource through all three lenses, you can differentiate between “busy but healthy” (high utilization, low saturation, no errors) and “over‑committed” (high saturation, rising errors).
Utilization
Utilization is the easiest to measure. For CPUs, the metric is usually a percentage of total cycles spent executing non‑idle threads. For memory, it’s the fraction of RAM that is allocated or actively used.
# Quick CPU utilization check on Linux
top -b -n1 | grep "Cpu(s)" | \
awk '{print 100 - $8"%"}' # $8 is the idle column
In a Kubernetes cluster, you can pull the same data from the metrics server:
kubectl top pod my-service-abc123
If you prefer a programmatic approach, Python’s psutil library can emit both CPU and memory utilization in a single call:
import psutil, json, time
def snapshot():
return {
"cpu_percent": psutil.cpu_percent(interval=1),
"mem_percent": psutil.virtual_memory().percent,
"mem_used": psutil.virtual_memory().used,
"mem_total": psutil.virtual_memory().total,
}
if __name__ == "__main__":
while True:
print(json.dumps(snapshot()))
time.sleep(5)
Note: Utilization alone never tells you if a resource is a bottleneck. A web server can sit at 85 % CPU utilization during a normal traffic surge and still have headroom, but the same number on a batch‑processing node might indicate imminent throttling.
Saturation
Saturation captures the queue depth—how many requests are waiting for the resource. On the CPU side, this is the run‑queue length; on disks, it’s the average I/O wait time; on networks, it’s the number of packets queued in the NIC driver.
# Linux run‑queue length (saturation proxy for CPU)
cat /proc/loadavg | awk '{print $4}'
For storage, iostat gives you the crucial await metric (average wait time per I/O request):
iostat -x 1 3 | awk 'NR>6 {print $1, $10}' # $10 is await in ms
In Prometheus, a classic saturation query for a PostgreSQL instance looks like:
avg(rate(pg_stat_activity_count[1m])) by (instance)
If the rate approaches the configured max_connections, you are saturating the DB connection pool.
Errors
Errors are the most urgent signal: they indicate that the resource is not just slow but failing. Hardware error counters are exposed via smartctl for disks, ethtool -S for NICs, and dmesg for kernel‑level faults.
# SMART read error count for /dev/sda
smartctl -a /dev/sda | grep "Read Error Rate"
Application‑level errors are typically surfaced as HTTP 5xx responses, exception counters, or circuit‑breaker trips. In an OpenTelemetry‑enabled service, you can export these as a counter metric:
# otelcol config snippet (YAML)
receivers:
otlp:
protocols:
grpc:
exporters:
prometheus:
endpoint: "0.0.0.0:9464"
service:
pipelines:
metrics:
receivers: [otlp]
exporters: [prometheus]
Caution: A spike in error counts without a corresponding rise in utilization or saturation often points to a configuration or code issue rather than a resource shortage.
Instrumenting Production Systems
The USE method only becomes actionable when you have reliable, low‑latency observability data. Below is a reference stack that works well for most cloud‑native workloads:
- Metrics collection – Prometheus (scrape), StatsD, or OpenTelemetry Collector.
- Log aggregation – Loki or Elastic Stack, enriched with structured fields.
- Tracing – Jaeger or Zipkin, feeding back into latency analysis.
- Alerting – Alertmanager with routing to PagerDuty, Slack, or Opsgenie.
Prometheus Scrape Config for Host‑Level Metrics
scrape_configs:
- job_name: 'node_exporter'
static_configs:
- targets: ['10.2.3.4:9100', '10.2.3.5:9100']
This pulls CPU, memory, and disk metrics that map directly to the three USE dimensions. Pair it with the node_exporter textfile collector to expose custom error counters.
Exporting Custom Error Counters
# Using prometheus_client in Python
from prometheus_client import Counter, start_http_server
import random, time
disk_read_errors = Counter('disk_read_errors_total', 'Total number of disk read errors')
def simulate_errors():
if random.random() < 0.01: # 1% chance per loop
disk_read_errors.inc()
if __name__ == "__main__":
start_http_server(8000)
while True:
simulate_errors()
time.sleep(1)
Once scraped, you can create a saturation‑aware alert:
disk_read_errors_total{job="myservice"} > 5 and on (instance) avg(rate(node_disk_io_time_seconds_total[5m])) by (instance) > 0.8
The alert fires only when errors climb and the disk I/O time (a saturation proxy) is already high, reducing false positives.
Patterns in Production
1. The “Three‑Tier” Monitoring Pyramid
┌─────────────────────┐
│ Business‑level SLOs │
└───────▲───────▲─────┘
│ │
Service‑level │
Latency/Error │
│ │
Resource‑level│
(CPU, Disk…) │
└─────────────────────┘
At the base, you collect raw USE metrics. The middle tier aggregates them into service‑level objectives (SLOs) like “99.9 % of requests < 200 ms”. The top tier translates those SLOs into business outcomes (e.g., “order‑completion rate”). This hierarchy ensures that a high‑CPU alert doesn’t cause alarm fatigue unless it jeopardizes an SLO.
2. “Dual‑Threshold” Alerts
A classic mistake is to alert on a single metric (e.g., CPU > 80 %). Dual‑threshold alerts combine utilization with saturation or errors:
# CPU utilisation > 80% AND run‑queue > 2 per core
cpu_seconds_total{mode!="idle"} / ignoring(mode) group_left sum(cpu_seconds_total) by (instance) > 0.8
and
node_load5{instance=~".*"} / count(node_cpu_seconds_total{mode="system"}) by (instance) > 2
Only when both conditions hold does the alert fire, reflecting true pressure on the scheduler.
3. “Self‑Healing” Loops
When you detect a saturation condition, an automated remediation can be triggered. For example, autoscaling a Kafka consumer group when consumer_fetch_rate drops and fetch_latency_ms spikes:
# KEDA ScaledObject (YAML)
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer-scaler
spec:
scaleTargetRef:
name: kafka-consumer
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
topic: events
lagThreshold: "5000"
The lag threshold is a saturation metric; the scaler adds more pods, which in turn reduces utilization of downstream services.
Architecture of a Monitoring Stack
A robust USE‑centric stack must survive network partitions, storage outages, and version upgrades. The following diagram (described in text) outlines a resilient design:
- Sidecar Exporters – Each service pod runs a Prometheus sidecar that scrapes both the application’s
/metricsendpoint and the host’snode_exporter. This isolates collector failure from the main process. - Federated Prometheus – A regional Prometheus scrapes the sidecars, while a global Prometheus federates the regional instances. This reduces scrape load and provides a natural hierarchy for SLO roll‑ups.
- Long‑Term Storage – Thanos or Cortex stores historic data, enabling trend analysis of saturation over weeks. This is crucial for capacity planning.
- Alert Routing – Alertmanager clusters with peer‑to‑peer gossip ensure no single point of failure. Alerts are deduplicated and routed based on severity and on‑call schedules.
- Dashboard Layer – Grafana dashboards expose three panels per resource: Utilization (gauge), Saturation (heat‑map), Errors (log stream). Using templating, you can switch a dashboard from “CPU” to “Disk” in a single click.
Example Grafana Dashboard JSON Snippet (CPU Panel)
{
"type": "graph",
"title": "CPU Utilization & Saturation",
"targets": [
{
"expr": "100 - avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100",
"legendFormat": "{{instance}} Utilization"
},
{
"expr": "node_load5 / count(node_cpu_seconds_total) by (instance)",
"legendFormat": "{{instance}} Run‑Queue per Core"
}
],
"yaxes": [{ "format": "percent" }, { "format": "short" }]
}
When the run‑queue line crosses the “2 per core” threshold while utilization is already > 85 %, you have a classic CPU saturation scenario.
Diagnosing Common Bottlenecks
CPU Bottleneck
Symptoms: High utilization (> 90 %), run‑queue > 2 per core, occasional CPU_HARDWARE_ERROR counters.
Root Causes:
- Inefficient code paths (e.g., busy‑wait loops).
- Over‑provisioned containers without proper CPU limits, causing “noisy neighbor” contention.
- Kernel scheduler mis‑configuration on VMs.
Remediation Steps:
- Profile hot functions with
perf record -gand visualize withperf report. - Enforce cgroup CPU quotas (
cpu_quotaandcpu_period) to isolate workloads. - If errors persist, schedule a hardware health check (
mcelog).
Memory Bottleneck
Symptoms: RAM utilization > 80 %, swap usage rising, oom_kill events in dmesg.
Root Causes:
- Memory leaks in Java heap or Go goroutine accumulation.
- Over‑commit settings (
vm.overcommit_memory=1) masking real pressure. - Large page fragmentation causing high latency for hugepage allocations.
Remediation Steps:
- Enable
cAdvisormemory RSS metrics and set alerts oncontainer_memory_working_set_bytes. - Use
jemallocwith profiling (MALLOC_CONF=prof:true) to locate leaks. - Adjust
sysctl vm.overcommit_memory=2and set per‑processulimit -v.
Disk I/O Bottleneck
Symptoms: await > 20 ms, disk_io_time_seconds_total approaching 1.0, SMART error counters rising.
Root Causes:
- Log‑heavy workloads on the same SSD as the database.
- Inadequate write amplification on NVMe drives.
- Filesystem mis‑tuned (
noatimemissing, ext4 journal mode =ordered).
Remediation Steps:
- Separate logs to a dedicated volume (e.g., EBS gp3 vs. io2).
- Enable
writebackjournaling (tune2fs -o journal_data_writeback). - Deploy LSM‑based caching (e.g.,
bcache) for hot data.
Network Bottleneck
Symptoms: High interface queue length (ifconfig eth0 shows TX/RX > 1000), increased retransmissions, latency spikes in http_request_duration_seconds.
Root Causes:
- TCP congestion window mis‑tuned on virtual NICs.
- Oversubscribed VPC peering links.
- Micro‑service mesh (e.g., Istio) adding per‑call overhead.
Remediation Steps:
- Enable TCP BBR (
sysctl -w net.ipv4.tcp_congestion_control=bbr). - Use
ciliumeBPF dataplane to reduce per‑packet processing overhead. - Apply
rate_limitpolicies in the service mesh to smooth bursts.
Key Takeaways
- Three‑dimensional view: Always evaluate a resource through Utilization, Saturation, and Errors; missing any dimension blinds you to real bottlenecks.
- Metric coupling: Pair utilization metrics with saturation and error counters in alerts to cut noise and surface true incidents.
- Instrumentation matters: Use sidecar exporters, custom counters, and hardware‑level SMART data to capture the full USE picture.
- Architectural patterns: Adopt a hierarchical monitoring stack (sidecars → regional Prometheus → global Thanos) to keep latency low and retention high.
- Automation: Dual‑threshold alerts and self‑healing autoscalers turn detection into remediation, reducing MTTR.
- Production validation: Regularly run chaos experiments (e.g.,
chaos-mesh) that intentionally saturate a resource to verify your USE‑based alerts fire as expected.
Further Reading
- Brendan Gregg’s original USE Method article
- Prometheus documentation – Best practices for alerting
- Kubernetes Horizontal Pod Autoscaler (HPA) and custom metrics