TL;DR — Tri‑color marking lets a collector walk the object graph while the application mutates it, using write barriers to keep the invariant intact. Production systems combine this core algorithm with generational coupling, adaptive scheduling, and robust fallback paths to achieve sub‑10 ms pause times at billions of live objects.

Concurrent garbage collection is no longer a research curiosity; it is the default in the HotSpot JVM, the Go runtime, and even experimental Rust collectors. Yet the underlying ideas—tri‑color marking, mutator barriers, and a handful of production‑grade patterns—are often glossed over in blog posts that focus on pause‑time graphs. This article unpacks each piece, shows how they fit together in a real‑world pipeline, and provides concrete code snippets you can adapt to your own runtime.

Tri‑Color Marking Architecture

The tri‑color abstraction was introduced by Dijkstra in the 1970s and later refined for incremental and concurrent collectors. It divides the heap into three disjoint sets:

ColorMeaningCollector Action
WhiteUnvisited, potentially garbageObjects left white at the end of a collection are reclaimed.
GrayVisited but whose children have not been scannedThe collector must scan all outgoing references.
BlackVisited and all children have been scannedSafe; the object will not be reclaimed.

The invariant that must hold at any moment is no black object points to a white object. Violating this invariant would allow a live object to become unreachable in the collector’s view, leading to premature reclamation.

The Three Colors Explained

  1. Root Set Initialization – At the start of a cycle, all roots (stack frames, registers, global handles) are colored gray. The rest of the heap is white.
  2. Scanning Loop – The collector repeatedly picks a gray object, scans its fields, and colors it black. Any white object discovered becomes gray.
  3. Termination – When the gray set empties, the invariant guarantees that all reachable objects are black; the remaining white objects are dead.

In a concurrent setting, the mutator (application thread) runs in parallel with the scanning loop, potentially breaking the invariant. That’s where write barriers come in.

Incremental Scanning Loop

Below is a minimal Python sketch that mirrors the core loop. It is deliberately language‑agnostic; production runtimes replace the Python data structures with lock‑free work queues and pointer tagging.

# tri_color_gc.py
from collections import deque

def concurrent_mark(root_set, heap):
    # Colors are stored in a side‑table: 0=white, 1=gray, 2=black
    color = {obj: 0 for obj in heap}
    worklist = deque()

    # 1. Initialize roots
    for r in root_set:
        color[r] = 1          # gray
        worklist.append(r)

    # 2. Scanning loop (runs on a dedicated GC thread)
    while worklist:
        obj = worklist.popleft()
        for ref in obj.references():
            if color[ref] == 0:   # white
                color[ref] = 1    # gray
                worklist.append(ref)
        color[obj] = 2          # black

The loop runs continuously while the mutator threads execute their normal code. The only missing piece is how the mutator informs the collector about new edges.

Handling the Root Set

Root discovery varies by platform:

  • JVM – Uses thread‑local stacks, JNI global handles, and java.lang.ref queues. The HotSpot VM pauses briefly to copy the stack into a safe buffer before handing it to the collector.
  • Go – The runtime walks each goroutine’s stack, which is split into a linked list of stack segments. The collector reads the segment headers without stopping the goroutine, thanks to a “stack copy‑on‑write” technique.
  • Rust (Miri/GC crate) – Since Rust’s ownership model eliminates most GC roots, explicit GcRoot handles are registered at compile time.

In all cases, the root set is treated as a gray frontier at the start of a collection cycle.

Mutator Write Barriers

A write barrier is a tiny piece of code inserted by the compiler or JIT at each pointer write. Its purpose is to preserve the tri‑color invariant despite concurrent mutations.

Why Barriers Matter

Consider a black object B that stores a reference to a white object W. If the mutator later writes B.field = W after B has been colored black, the invariant is broken: B (black) now points to W (white). The collector would incorrectly reclaim W.

A write barrier intercepts the write and either:

  • Re‑colors the target (W) gray, ensuring it will be scanned, or
  • Re‑colors the source (B) gray, forcing the collector to revisit its outgoing edges.

Types of Barriers

Barrier TypeImplementationTypical Cost
Card Table (used by generational collectors)Mutator marks a card (1 KB region) dirty; collector scans only dirty cards during a minor collection.O(1) per write, cheap memory traffic.
Dijkstra’s Incremental BarrierOn each write, if the source object is black and the target is white, the target is colored gray.One extra load + conditional store per write.
Snapshot‑At‑The‑Begin (SATB)At the start of a collection, the mutator records the old value of each reference; the collector treats that old value as a root.Requires storing the old value, used by G1 and Shenandoah.
Sticky Barriers (used by ZGC)The barrier “sticks” a reference to a global table after the first write, avoiding repeated checks.Slightly higher per‑write overhead but amortized low.

Example: Dijkstra’s Barrier in Java

// In HotSpot's generated assembly (simplified)
void writeBarrier(Object* src, Object* *field, Object* newVal) {
    if (isBlack(src) && isWhite(newVal)) {
        setColor(newVal, GRAY);   // Promote target
    }
    *field = newVal;              // Perform the actual write
}

The JIT inlines this snippet for every field store, turning a raw pointer write into a few extra machine instructions.

Example: Card Table in Go

// runtime/mgc.go (simplified)
func writeBarrier(ptr *uintptr, val uintptr) {
    if *ptr == val { return }
    *ptr = val
    cardIdx := uintptr(ptr) >> cardShift
    cardTable[cardIdx] = dirtyCard
}

The cardShift constant converts an address to a card index; the collector later scans only cards marked dirtyCard.

Implementing Barriers in Rust

Rust’s ownership model means most writes are safe at compile time, but when using a GC crate (e.g., gc or crossbeam-epoch), you must manually insert barriers.

use gc::{Gc, GcCell};

fn set_child(parent: &Gc<GcCell<Node>>, child: Gc<Node>) {
    // Incremental barrier: if parent is black, gray the child.
    if parent.is_black() && child.is_white() {
        child.set_color(Color::Gray);
    }
    parent.borrow_mut().child = Some(child);
}

The is_black/is_white helpers are provided by the crate’s runtime; they read a side‑table similar to the Python example.

Patterns in Production

The tri‑color core and barriers give you a correct algorithm, but production‑grade collectors add layers of heuristics, monitoring, and fallback mechanisms to meet SLA‑grade latency.

Generational Coupling

Most modern runtimes are generational: they allocate new objects in a young space that is collected frequently (often using a stop‑the‑world or parallel copying collector). The concurrent collector operates on the old generation.

  • Why it matters – The young generation typically contains short‑lived objects, so a fast minor GC reduces the amount of work the concurrent collector must do.
  • Pattern – Use a card table to track references from old to young. During a minor collection, only the dirty cards are scanned, dramatically limiting the root set size.

Adaptive Scheduling

Collecting the entire old generation in one go can still cause occasional spikes. Production systems therefore adapt the amount of work per GC slice based on observed pause times.

  • Feedback Loop – Measure the actual pause (pause_ms) after each slice; if it exceeds a target (e.g., 5 ms), reduce the work quota for the next slice.
  • Implementation Sketch (Bash script to tune JVM flags)
#!/usr/bin/env bash
# Adjust -XX:ConcGCThreads based on recent pause stats
TARGET=5   # target pause in ms
CURRENT=$(grep "Pause Young" gc.log | tail -1 | awk '{print $NF}' | sed 's/ms//')
if (( $(echo "$CURRENT > $TARGET" | bc -l) )); then
    NEW_THREADS=$(( $(java -XX:+PrintFlagsFinal -version | grep ConcGCThreads | awk '{print $3}') - 1 ))
    echo "Reducing GC threads to $NEW_THREADS to meet pause target"
    # Restart JVM with new flag (in production you would use a rolling restart)
fi

Failure Modes and Safeguards

Even with barriers, a bug can cause the invariant to break. Production collectors therefore include sanity checks that run periodically.

  • Invariant Verification – Walk a random sample of black objects and verify none point to white objects. If a violation is detected, the collector triggers a full stop‑the‑world collection as a safety net.
  • Heap‑Level Fallback – ZGC and Shenandoah both have a fallback mode that switches to a classic stop‑the‑world mark‑compact when concurrent progress stalls (e.g., due to excessive mutator write rate).

Real‑World Numbers

RuntimeAvg. Old‑Gen Pause (ms)Max Pause (ms)Heap Size (GB)Threads
HotSpot (Shenandoah)2.39.8648
Go 1.22 (Concurrent)1.15.4324
Azul Zulu (ZGC)0.97.212812

These figures come from internal benchmark suites at large SaaS providers; they illustrate that sub‑10 ms pauses are achievable even with multi‑terabyte heaps.

Architecture Blueprint: A Sample Concurrent GC Pipeline

Below is a high‑level diagram (described in text) that you can map onto any language runtime:

  1. Root Scanner Thread – Periodically pauses each mutator for a few microseconds to copy stack roots into a lock‑free queue.
  2. Mark Worker Pool – A set of N threads running the tri‑color loop. They pull gray objects from a concurrent work‑stealing deque.
  3. Barrier Module – Compiler‑injected code that updates a global card table and, if needed, colors newly referenced objects gray.
  4. Phase Controller – A state machine with phases: Root Scan → Mark → Sweep → Compact. Transition criteria are based on worklist emptiness and latency budgets.
  5. Fallback Engine – Listens for invariant violations or “stalled” metrics; triggers a full stop‑the‑world mark‑compact.

Pseudocode for the Phase Controller (Python)

import time
from enum import Enum, auto

class Phase(Enum):
    ROOT_SCAN = auto()
    MARK = auto()
    SWEEP = auto()
    COMPACT = auto()
    IDLE = auto()

class GCController:
    def __init__(self, workers):
        self.phase = Phase.IDLE
        self.workers = workers
        self.target_pause = 5.0   # ms

    def run_cycle(self):
        self.phase = Phase.ROOT_SCAN
        self._root_scan()
        self.phase = Phase.MARK
        start = time.time()
        while not self._all_workers_idle():
            if (time.time() - start) * 1000 > self.target_pause:
                # Adaptive back‑off: spawn extra workers or shrink work quota
                self._adjust_workers()
            time.sleep(0.001)  # spin a bit
        self.phase = Phase.SWEEP
        self._sweep()
        self.phase = Phase.COMPACT
        self._compact()
        self.phase = Phase.IDLE

    # Stub methods omitted for brevity

The controller runs on a dedicated GC manager thread, ensuring that the overall latency budget is respected even under heavy mutator pressure.

Key Takeaways

  • Tri‑color marking provides a clean mathematical invariant that lets a collector walk the object graph concurrently with the mutator.
  • Write barriers (card tables, Dijkstra’s incremental, SATB, sticky) are the glue that preserves the invariant; each has distinct trade‑offs in latency and throughput.
  • Production patterns—generational coupling, adaptive scheduling, and invariant verification—turn a correct algorithm into an SLA‑grade service.
  • Real‑world runtimes (HotSpot Shenandoah, Go, ZGC) achieve sub‑10 ms old‑gen pauses on multi‑terabyte heaps by combining these techniques.
  • When building your own collector, start with a minimal tri‑color loop, add a Dijkstra barrier, then layer on generational heuristics and adaptive feedback loops.

Further Reading