TL;DR — The USE method remains the most actionable performance framework when you tie each metric to a concrete resource (CPU, disk, network, queue). Deploy Prometheus exporters, watch saturation spikes, and correlate errors to capacity limits to keep latency under control in production.

Modern services are a mash‑up of containers, managed databases, and event streams. When latency spikes or error rates climb, engineers scramble for a signal. The Utilization, Saturation, Errors (USE) method, popularized by Brendan Gregg, offers a three‑pronged lens that cuts through noise. This post shows how to implement the method end‑to‑end: instrumenting the right exporters, wiring them into a Grafana dashboard, and embedding the insights into runbooks for Kafka, PostgreSQL, and Kubernetes workloads.

Understanding the USE Method

The USE method is deliberately simple:

DimensionWhat it measuresTypical source
UtilizationHow busy a resource is relative to its capacityCPU % (node exporter), disk I/O bytes/sec
SaturationHow much demand is queuing up because the resource can’t keep upRun queue length, disk queue depth, Kafka consumer lag
ErrorsHow many operations are failing because the resource is exhaustedTCP retransmits, DB connection errors, HTTP 5xx

When all three are low, the system is healthy. When any one climbs, you have a focused troubleshooting path.

“If you can’t measure it, you can’t manage it.” – a principle that underpins the USE method and modern observability stacks like Prometheus + Grafana.

Utilization in Practice

Utilization is the most familiar metric, but it’s easy to misinterpret. A CPU at 70 % on a multi‑core node may still have spare capacity if the workload is single‑threaded. Likewise, a network interface at 30 % bandwidth can be saturated if the underlying NIC is throttling due to driver bugs.

Concrete rule: Always normalize utilization against effective capacity, not raw hardware limits.

# prometheus.yml – scrape config for node_exporter
scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: 'node_cpu_seconds_total'
        action: keep

The node_cpu_seconds_total counter can be turned into a per‑core utilization percentage by dividing by the scrape interval and the number of cores.

Saturation in Practice

Saturation surfaces the queue that builds up when a resource can’t keep up. For a database, this is the number of waiting client connections; for a Kafka broker, it’s the consumer lag; for a Kubernetes pod, it’s the runqueue length.

# Bash snippet: calculate average runqueue length over the last minute
runqueue=$(awk '{sum+=$2} END {print sum/NR}' /proc/loadavg)
echo "Runqueue avg (last minute): $runqueue"

A sustained runqueue > 2 per CPU core usually signals CPU saturation. In production, we set alerts on node_load1 > (cpu_cores × 1.5).

Errors in Practice

Errors are the ultimate symptom: requests that never completed. However, raw error counts are noisy; they become meaningful when correlated with utilization and saturation.

# Python: expose custom error metric for a Flask app
from prometheus_client import Counter, start_http_server
error_counter = Counter('flask_http_5xx_total', '5xx responses')
@app.errorhandler(500)
def internal_error(e):
    error_counter.inc()
    return "Internal Server Error", 500

By exposing a dedicated error counter, you can plot it alongside CPU utilization and see whether spikes align with high load or with a downstream dependency failure.

Architecture Patterns for Applying USE

Kafka as a Saturation Frontier

Kafka’s consumer lag is the classic saturation indicator for streaming pipelines. When producers outpace consumers, the lag metric (consumer_lag) grows, eventually exhausting disk space.

Pattern: Deploy a side‑car Prometheus exporter (kafka_exporter) on each broker, and aggregate lag across consumer groups.

# kafka_exporter scrape config
- job_name: 'kafka'
  static_configs:
    - targets: ['kafka-broker-1:9308', 'kafka-broker-2:9308']

Couple lag alerts with broker disk utilization (node_filesystem_avail_bytes) to trigger a “scale‑out consumer” runbook before the broker runs out of space.

PostgreSQL: Utilization vs. Saturation

PostgreSQL exposes pg_stat_activity (active connections) and pg_stat_bgwriter (checkpoint activity). High pg_stat_activity combined with rising pg_stat_bgwriter_buffers_checkpoint indicates saturation of the write path.

-- Example: expose active connections as a metric via pg_exporter
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';

When active connections approach max_connections, you can either increase the limit (if resources allow) or introduce a connection pooler like PgBouncer to flatten the saturation curve.

Kubernetes Pods: The Triple Threat

Kubernetes abstracts resources, but the underlying node still exhibits USE characteristics. The kube‑state‑metrics exporter provides kube_pod_container_resource_requests_cpu_cores (capacity) and container_cpu_usage_seconds_total (utilization). Saturation appears as container_cpu_cfs_throttled_seconds_total.

# Prometheus rule: alert when CPU throttling > 5% over 5m
- alert: CpuThrottlingHigh
  expr: rate(container_cpu_cfs_throttled_seconds_total[5m]) > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Pod {{ $labels.pod }} is being throttled"
    description: "CPU throttling indicates saturation; consider increasing request limits or scaling horizontally."

Errors surface via kube_pod_container_status_restarts_total. A spike in restarts concurrent with throttling almost always points to saturation rather than code bugs.

Implementing Metrics Collection

1. Choose the Right Exporters

SystemExporterKey USE Metrics
Linux nodenode_exporternode_cpu_seconds_total, node_disk_io_time_seconds_total, node_network_receive_errs_total
Kafkakafka_exporterkafka_consumer_lag, kafka_broker_topic_bytes_in_total
PostgreSQLpostgres_exporterpg_stat_activity, pg_stat_database_xact_commit, pg_stat_bgwriter_buffers_checkpoint
Kuberneteskube_state_metrics + cAdvisorcontainer_cpu_usage_seconds_total, container_cpu_cfs_throttled_seconds_total, container_memory_working_set_bytes

Deploy each exporter as a DaemonSet (node‑level) or side‑car (service‑level) to guarantee coverage across the cluster.

2. Define a Unified Dashboard

A Grafana dashboard that groups metrics by resource type makes the USE method instantly visual:

  1. CPU Panel – Utilization (rate(node_cpu_seconds_total[1m])), Saturation (node_load1), Errors (node_cpu_softirq_total).
  2. Disk Panel – Utilization (rate(node_disk_io_time_seconds_total[1m])), Saturation (node_disk_queue_length), Errors (node_disk_read_errors_total).
  3. Network Panel – Utilization (rate(node_network_receive_bytes_total[1m])), Saturation (node_network_mtu_errors_total), Errors (node_network_tx_errors_total).
  4. Application Panel – Custom error counters, request latency histograms, and queue depth (e.g., Kafka lag).

3. Alerting Rules Aligned with USE

# Example: CPU utilization > 80% AND runqueue > 2 per core => alert
- alert: CpuSaturation
  expr: |
    (sum(rate(node_cpu_seconds_total{mode!="idle"}[1m]))
      / sum(rate(node_cpu_seconds_total[1m]))) > 0.8
    and
    (node_load1 / count(node_cpu_seconds_total)) > 2
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "CPU saturation on {{ $labels.instance }}"
    description: "Utilization {{ $value }} and runqueue {{ $labels.node_load1 }} indicate throttling."

Separate alerts for errors keep the signal‑to‑noise ratio low. For example, fire an error alert only when error rate exceeds 0.5 % and utilization is above 70 % – this combination tells you the errors are load‑related rather than code‑path anomalies.

Common Pitfalls and Failure Modes

Pitfall 1: Treating Utilization as a Binary Health Indicator

Many teams set a hard “CPU > 70 % = alert” rule. In reality, a well‑tuned service can safely run at 90 % utilization if saturation stays low. The fix: correlate utilization with saturation and error metrics before escalating.

Pitfall 2: Ignoring Multi‑Tenancy Interference

On shared nodes, one noisy pod can drive saturation for unrelated workloads. Use cgroup‑level metrics (container_cpu_cfs_throttled_seconds_total) to attribute throttling to the offending pod, then apply pod‑priority or QoS classes.

Pitfall 3: Over‑Aggregating Metrics

Aggregating all disk I/O into a single series hides per‑device saturation. Export each block device separately and set alerts on the highest‑latency device.

Failure Mode: “Silent Saturation”

A classic scenario: Disk I/O utilization hovers at 55 % while the I/O queue length climbs to 30. Because utilization never crosses a threshold, the alerting system stays quiet, yet latency doubles. The remedy is to monitor queue depth (node_disk_queue_length) as a primary saturation metric.

Failure Mode: “Error Flood After Saturation”

When a queue backs up, downstream services often start returning 5xx errors, creating a feedback loop. An alert that triggers on error rate spikes coupled with high saturation lets you cut the loop early—e.g., by shedding load or scaling out the upstream producer.

Key Takeaways

  • Utilization tells you how busy a resource is; Saturation tells you how much is waiting; Errors tell you what’s failing because of the wait.
  • Instrument each layer (node, service, queue) with a dedicated exporter; avoid generic “system‑wide” metrics that mask per‑resource behavior.
  • Correlate the three dimensions in alerts and dashboards; a single high‑utilization alarm is rarely actionable without saturation context.
  • Apply the method to concrete systems—Kafka lag, PostgreSQL connection pools, Kubernetes pod throttling—to surface production‑grade insights.
  • Regularly review alert thresholds against real‑world incidents; adjust to keep “silent saturation” from slipping through.

Further Reading