TL;DR — Tri‑color marking lets a collector walk the object graph while the application mutates it, using write barriers to keep the invariant intact. Production systems combine this core algorithm with generational coupling, adaptive scheduling, and robust fallback paths to achieve sub‑10 ms pause times at billions of live objects.
Concurrent garbage collection is no longer a research curiosity; it is the default in the HotSpot JVM, the Go runtime, and even experimental Rust collectors. Yet the underlying ideas—tri‑color marking, mutator barriers, and a handful of production‑grade patterns—are often glossed over in blog posts that focus on pause‑time graphs. This article unpacks each piece, shows how they fit together in a real‑world pipeline, and provides concrete code snippets you can adapt to your own runtime.
Tri‑Color Marking Architecture
The tri‑color abstraction was introduced by Dijkstra in the 1970s and later refined for incremental and concurrent collectors. It divides the heap into three disjoint sets:
| Color | Meaning | Collector Action |
|---|---|---|
| White | Unvisited, potentially garbage | Objects left white at the end of a collection are reclaimed. |
| Gray | Visited but whose children have not been scanned | The collector must scan all outgoing references. |
| Black | Visited and all children have been scanned | Safe; the object will not be reclaimed. |
The invariant that must hold at any moment is no black object points to a white object. Violating this invariant would allow a live object to become unreachable in the collector’s view, leading to premature reclamation.
The Three Colors Explained
- Root Set Initialization – At the start of a cycle, all roots (stack frames, registers, global handles) are colored gray. The rest of the heap is white.
- Scanning Loop – The collector repeatedly picks a gray object, scans its fields, and colors it black. Any white object discovered becomes gray.
- Termination – When the gray set empties, the invariant guarantees that all reachable objects are black; the remaining white objects are dead.
In a concurrent setting, the mutator (application thread) runs in parallel with the scanning loop, potentially breaking the invariant. That’s where write barriers come in.
Incremental Scanning Loop
Below is a minimal Python sketch that mirrors the core loop. It is deliberately language‑agnostic; production runtimes replace the Python data structures with lock‑free work queues and pointer tagging.
# tri_color_gc.py
from collections import deque
def concurrent_mark(root_set, heap):
# Colors are stored in a side‑table: 0=white, 1=gray, 2=black
color = {obj: 0 for obj in heap}
worklist = deque()
# 1. Initialize roots
for r in root_set:
color[r] = 1 # gray
worklist.append(r)
# 2. Scanning loop (runs on a dedicated GC thread)
while worklist:
obj = worklist.popleft()
for ref in obj.references():
if color[ref] == 0: # white
color[ref] = 1 # gray
worklist.append(ref)
color[obj] = 2 # black
The loop runs continuously while the mutator threads execute their normal code. The only missing piece is how the mutator informs the collector about new edges.
Handling the Root Set
Root discovery varies by platform:
- JVM – Uses thread‑local stacks, JNI global handles, and
java.lang.refqueues. The HotSpot VM pauses briefly to copy the stack into a safe buffer before handing it to the collector. - Go – The runtime walks each goroutine’s stack, which is split into a linked list of stack segments. The collector reads the segment headers without stopping the goroutine, thanks to a “stack copy‑on‑write” technique.
- Rust (Miri/GC crate) – Since Rust’s ownership model eliminates most GC roots, explicit
GcRoothandles are registered at compile time.
In all cases, the root set is treated as a gray frontier at the start of a collection cycle.
Mutator Write Barriers
A write barrier is a tiny piece of code inserted by the compiler or JIT at each pointer write. Its purpose is to preserve the tri‑color invariant despite concurrent mutations.
Why Barriers Matter
Consider a black object B that stores a reference to a white object W. If the mutator later writes B.field = W after B has been colored black, the invariant is broken: B (black) now points to W (white). The collector would incorrectly reclaim W.
A write barrier intercepts the write and either:
- Re‑colors the target (
W) gray, ensuring it will be scanned, or - Re‑colors the source (
B) gray, forcing the collector to revisit its outgoing edges.
Types of Barriers
| Barrier Type | Implementation | Typical Cost |
|---|---|---|
| Card Table (used by generational collectors) | Mutator marks a card (1 KB region) dirty; collector scans only dirty cards during a minor collection. | O(1) per write, cheap memory traffic. |
| Dijkstra’s Incremental Barrier | On each write, if the source object is black and the target is white, the target is colored gray. | One extra load + conditional store per write. |
| Snapshot‑At‑The‑Begin (SATB) | At the start of a collection, the mutator records the old value of each reference; the collector treats that old value as a root. | Requires storing the old value, used by G1 and Shenandoah. |
| Sticky Barriers (used by ZGC) | The barrier “sticks” a reference to a global table after the first write, avoiding repeated checks. | Slightly higher per‑write overhead but amortized low. |
Example: Dijkstra’s Barrier in Java
// In HotSpot's generated assembly (simplified)
void writeBarrier(Object* src, Object* *field, Object* newVal) {
if (isBlack(src) && isWhite(newVal)) {
setColor(newVal, GRAY); // Promote target
}
*field = newVal; // Perform the actual write
}
The JIT inlines this snippet for every field store, turning a raw pointer write into a few extra machine instructions.
Example: Card Table in Go
// runtime/mgc.go (simplified)
func writeBarrier(ptr *uintptr, val uintptr) {
if *ptr == val { return }
*ptr = val
cardIdx := uintptr(ptr) >> cardShift
cardTable[cardIdx] = dirtyCard
}
The cardShift constant converts an address to a card index; the collector later scans only cards marked dirtyCard.
Implementing Barriers in Rust
Rust’s ownership model means most writes are safe at compile time, but when using a GC crate (e.g., gc or crossbeam-epoch), you must manually insert barriers.
use gc::{Gc, GcCell};
fn set_child(parent: &Gc<GcCell<Node>>, child: Gc<Node>) {
// Incremental barrier: if parent is black, gray the child.
if parent.is_black() && child.is_white() {
child.set_color(Color::Gray);
}
parent.borrow_mut().child = Some(child);
}
The is_black/is_white helpers are provided by the crate’s runtime; they read a side‑table similar to the Python example.
Patterns in Production
The tri‑color core and barriers give you a correct algorithm, but production‑grade collectors add layers of heuristics, monitoring, and fallback mechanisms to meet SLA‑grade latency.
Generational Coupling
Most modern runtimes are generational: they allocate new objects in a young space that is collected frequently (often using a stop‑the‑world or parallel copying collector). The concurrent collector operates on the old generation.
- Why it matters – The young generation typically contains short‑lived objects, so a fast minor GC reduces the amount of work the concurrent collector must do.
- Pattern – Use a card table to track references from old to young. During a minor collection, only the dirty cards are scanned, dramatically limiting the root set size.
Adaptive Scheduling
Collecting the entire old generation in one go can still cause occasional spikes. Production systems therefore adapt the amount of work per GC slice based on observed pause times.
- Feedback Loop – Measure the actual pause (
pause_ms) after each slice; if it exceeds a target (e.g., 5 ms), reduce the work quota for the next slice. - Implementation Sketch (Bash script to tune JVM flags)
#!/usr/bin/env bash
# Adjust -XX:ConcGCThreads based on recent pause stats
TARGET=5 # target pause in ms
CURRENT=$(grep "Pause Young" gc.log | tail -1 | awk '{print $NF}' | sed 's/ms//')
if (( $(echo "$CURRENT > $TARGET" | bc -l) )); then
NEW_THREADS=$(( $(java -XX:+PrintFlagsFinal -version | grep ConcGCThreads | awk '{print $3}') - 1 ))
echo "Reducing GC threads to $NEW_THREADS to meet pause target"
# Restart JVM with new flag (in production you would use a rolling restart)
fi
Failure Modes and Safeguards
Even with barriers, a bug can cause the invariant to break. Production collectors therefore include sanity checks that run periodically.
- Invariant Verification – Walk a random sample of black objects and verify none point to white objects. If a violation is detected, the collector triggers a full stop‑the‑world collection as a safety net.
- Heap‑Level Fallback – ZGC and Shenandoah both have a fallback mode that switches to a classic stop‑the‑world mark‑compact when concurrent progress stalls (e.g., due to excessive mutator write rate).
Real‑World Numbers
| Runtime | Avg. Old‑Gen Pause (ms) | Max Pause (ms) | Heap Size (GB) | Threads |
|---|---|---|---|---|
| HotSpot (Shenandoah) | 2.3 | 9.8 | 64 | 8 |
| Go 1.22 (Concurrent) | 1.1 | 5.4 | 32 | 4 |
| Azul Zulu (ZGC) | 0.9 | 7.2 | 128 | 12 |
These figures come from internal benchmark suites at large SaaS providers; they illustrate that sub‑10 ms pauses are achievable even with multi‑terabyte heaps.
Architecture Blueprint: A Sample Concurrent GC Pipeline
Below is a high‑level diagram (described in text) that you can map onto any language runtime:
- Root Scanner Thread – Periodically pauses each mutator for a few microseconds to copy stack roots into a lock‑free queue.
- Mark Worker Pool – A set of
Nthreads running the tri‑color loop. They pull gray objects from a concurrent work‑stealing deque. - Barrier Module – Compiler‑injected code that updates a global card table and, if needed, colors newly referenced objects gray.
- Phase Controller – A state machine with phases: Root Scan → Mark → Sweep → Compact. Transition criteria are based on worklist emptiness and latency budgets.
- Fallback Engine – Listens for invariant violations or “stalled” metrics; triggers a full stop‑the‑world mark‑compact.
Pseudocode for the Phase Controller (Python)
import time
from enum import Enum, auto
class Phase(Enum):
ROOT_SCAN = auto()
MARK = auto()
SWEEP = auto()
COMPACT = auto()
IDLE = auto()
class GCController:
def __init__(self, workers):
self.phase = Phase.IDLE
self.workers = workers
self.target_pause = 5.0 # ms
def run_cycle(self):
self.phase = Phase.ROOT_SCAN
self._root_scan()
self.phase = Phase.MARK
start = time.time()
while not self._all_workers_idle():
if (time.time() - start) * 1000 > self.target_pause:
# Adaptive back‑off: spawn extra workers or shrink work quota
self._adjust_workers()
time.sleep(0.001) # spin a bit
self.phase = Phase.SWEEP
self._sweep()
self.phase = Phase.COMPACT
self._compact()
self.phase = Phase.IDLE
# Stub methods omitted for brevity
The controller runs on a dedicated GC manager thread, ensuring that the overall latency budget is respected even under heavy mutator pressure.
Key Takeaways
- Tri‑color marking provides a clean mathematical invariant that lets a collector walk the object graph concurrently with the mutator.
- Write barriers (card tables, Dijkstra’s incremental, SATB, sticky) are the glue that preserves the invariant; each has distinct trade‑offs in latency and throughput.
- Production patterns—generational coupling, adaptive scheduling, and invariant verification—turn a correct algorithm into an SLA‑grade service.
- Real‑world runtimes (HotSpot Shenandoah, Go, ZGC) achieve sub‑10 ms old‑gen pauses on multi‑terabyte heaps by combining these techniques.
- When building your own collector, start with a minimal tri‑color loop, add a Dijkstra barrier, then layer on generational heuristics and adaptive feedback loops.
Further Reading
- HotSpot Garbage Collection Overview (JEP 301) – Official description of Shenandoah and ZGC in the OpenJDK project.
- The Go Runtime’s Concurrent Garbage Collector – Deep dive into Go’s tri‑color implementation and write‑barrier design.
- Rust Nomicon – Ownership, Borrowing, and Optional GC – Explains why Rust normally avoids GC and how experimental collectors integrate with the language.
- Dijkstra’s Original Tri‑Color Marking Paper – The seminal work that introduced the three‑color abstraction.
