TL;DR — Generational garbage collection splits the heap into young and old regions, enabling most objects to be reclaimed quickly. Both the HotSpot JVM and the .NET CLR expose well‑documented tuning knobs that let production teams balance pause‑time, throughput, and memory footprint.
Modern back‑end services run for months, handling millions of requests per second while staying within tight latency SLAs. Under that pressure, a pause‑heavy GC can become a silent outage. This post unpacks the internals of generational GC in the two most widely deployed managed runtimes—OpenJDK/HotSpot and Microsoft’s .NET CLR—then translates those internals into concrete patterns you can apply today.
Generational GC Fundamentals
Generational GC is built on a simple statistical observation: most objects die young. By allocating short‑lived objects in a young generation and promoting survivors to an old generation, the collector can run frequent, cheap collections on the young space while only occasionally scanning the larger, long‑lived old space.
Young Generation Mechanics
The young generation is typically divided into:
| Sub‑region | Purpose | Typical Size |
|---|---|---|
| Eden | Primary allocation point for new objects | 60–80 % of young space |
| Survivor 0 (S0) | Holds objects that survived the most recent minor GC | 10–20 % |
| Survivor 1 (S1) | Alternates with S0 on each minor GC | 10–20 % |
When the Eden space fills, a minor GC copies live objects to the survivor spaces, then promotes any that have survived a configurable number of minor collections (the tenuring threshold) to the old generation. The copy‑on‑write nature of this algorithm gives it O(N) cost where N is the number of live objects, not the total allocated size.
In HotSpot, the default tenuring threshold is 15 minor GCs, but you can override it with -XX:MaxTenuringThreshold=<n> (see the JVM Tuning section). .NET’s GC uses a similar concept called generation 0, with promotion governed by the GCHeapHardLimit and GCHeapCount settings.
Old Generation & Promotion
The old generation (also called tenured or gen2) holds objects that have survived enough young‑generation cycles. Because it is larger and less frequently collected, its algorithm is more sophisticated:
- HotSpot offers several collectors for the old generation—Parallel Scavenge, G1, ZGC, Shenandoah—each with distinct pause‑time vs. throughput trade‑offs.
- .NET provides three modes: Workstation, Server, and Background GC. Server mode creates a dedicated background thread that performs concurrent collections, reducing pause times at the cost of extra CPU.
Promotion is not free: moving an object from young to old incurs a copy and, if the old generation is already near capacity, can trigger a full GC (also called a major or mixed GC). Understanding when promotion happens is key to avoiding “promotion‑induced” latency spikes.
Architecture in the JVM
The HotSpot JVM’s GC architecture is modular. At a high level:
- Young collector (e.g., Parallel Scavenge or G1 Young) handles minor collections.
- Old collector (e.g., Parallel Old, G1 Mixed, ZGC) manages major collections.
- Coordinator (
-XX:+UseStringDeduplication,-XX:+UseCompressedOops) decides when to trigger mixed collections based on heap occupancy thresholds.
G1 GC: A Production Favorite
The Garbage‑First (G1) collector was introduced in Java 7 as a replacement for CMS. It partitions the heap into regions (typically 1–32 MiB each) and performs incremental evacuation of regions based on a pause‑time goal (-XX:MaxGCPauseMillis). G1’s mixed collections blend young and old region evacuation, giving you fine‑grained control over latency.
# Example JVM flags for a low‑latency G1 setup
java -Xms4g -Xmx4g \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=50 \
-XX:InitiatingHeapOccupancyPercent=45 \
-XX:ConcGCThreads=4 \
-XX:ParallelGCThreads=8 \
-jar myapp.jar
-XX:MaxGCPauseMillis=50tells G1 to aim for pauses under 50 ms.-XX:InitiatingHeapOccupancyPercent=45triggers the first concurrent cycle when the heap reaches 45 % usage, smoothing out the transition to mixed mode.
ZGC & Shenandoah: Near‑Zero Pauses
For workloads that cannot tolerate even a 10 ms pause, ZGC (JDK 11+) and Shenandoah (OpenJDK) offer region‑based concurrent collection. Both move objects using load‑linked/store‑conditional techniques, allowing the mutator to continue while the collector works.
# ZGC with heap limit and aggressive thread usage
java -Xmx8g -Xms8g -XX:+UseZGC -XX:ConcGCThreads=6 -XX:ParallelGCThreads=12 -jar service.jar
Key takeaways:
- ZGC scales linearly with heap size, making it suitable for >100 GiB heaps.
- Shenandoah shines on low‑core, high‑latency environments (e.g., container‑orchestrated microservices).
Architecture in the .NET CLR
The .NET runtime’s GC is built around generations and concurrent collection. Since .NET 5, the GC has been unified across Windows, Linux, and macOS, with the same configuration knobs.
Server vs. Workstation GC
- Workstation GC (default for desktop apps) uses a single background thread, optimizing for low CPU usage.
- Server GC (default for ASP.NET Core on Windows) creates one GC thread per logical CPU, maximizing throughput.
You select the mode via the System.GCSettings.IsServerGC property or the <gcServer> element in the runtime config.
<!-- .NET runtimeconfig.json snippet -->
{
"runtimeOptions": {
"configProperties": {
"System.GC.Server": true,
"System.GC.Concurrent": true,
"System.GC.RetainVM": false
}
}
}
System.GC.Server=trueenables server mode.System.GC.Concurrent=trueturns on background (concurrent) collections.
Large Object Heap (LOH) and Pinned Objects
Objects > 85 KiB go directly to the Large Object Heap (LOH), which is not compacted by default. This can cause fragmentation. Starting with .NET 5, you can request LOH compaction on the next full GC:
// Force LOH compaction in .NET 6+
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();
Pinned objects (e.g., fixed buffers, interop scenarios) also inhibit compaction. The CLR tracks pinning and reports it in dotnet-counters and ETW events.
Patterns in Production
Real‑world services rarely rely on a single GC setting. Instead, engineers adopt patterns that combine monitoring, incremental tuning, and fallback strategies.
1. Baseline “No‑Tuning” Deployment
Start with the runtime defaults:
- HotSpot:
-Xmx/-Xmsset to the same value,-XX:+UseG1GC. - .NET: Server GC enabled for ASP.NET Core, default pause‑time goals.
Run a steady‑state load test for at least 2 × the expected production traffic duration (e.g., 2 hours) and collect:
- GC pause histograms (
-Xlog:gcfor JVM,dotnet-tracefor .NET). - Heap occupancy over time.
- CPU utilization.
If pauses stay under your SLA (e.g., 100 ms), you may not need further tweaks.
2. “Latency‑First” Tuning
When latency spikes are unacceptable:
| Runtime | Key Flags / Settings | Typical Values |
|---|---|---|
| HotSpot (G1) | -XX:MaxGCPauseMillis | 20–50 ms |
| HotSpot (ZGC) | -XX:ConcGCThreads | 4–8 |
| .NET | System.GC.Concurrent | true (already default) |
| .NET | System.GC.Server | true (for multi‑core) |
| .NET | GCSettings.LatencyMode | LowLatency during critical sections |
Example: In a payment‑processing service, wrap critical sections in GC.TryStartNoGCRegion to temporarily suspend collections:
if (GC.TryStartNoGCRegion(1024 * 1024 * 100)) // 100 MiB budget
{
// Critical path – no GC pauses here
ProcessPayment(request);
GC.EndNoGCRegion();
}
Remember that NoGCRegion is a best‑effort API; the runtime may abort it if memory pressure rises.
3. “Throughput‑First” Scaling
For batch jobs or analytics pipelines where raw throughput matters more than pause latency:
- Increase
-XX:ParallelGCThreads(JVM) orSystem.GC.HeapCount(.NET) to match CPU cores. - Raise the young generation size (
-Xmnor-XX:NewSize/-XX:MaxNewSize) to reduce frequency of minor GCs. - Disable explicit GC calls (
System.GC.Collect()in .NET) unless you have a proven reason.
4. “Hybrid” Adaptive Loop
A production pattern that many cloud‑native teams adopt is a feedback loop driven by telemetry:
- Collect GC metrics every minute (
dotnet-counters,jstat, Prometheus exporters). - Analyze whether pause‑time percentiles exceed thresholds.
- Adjust configuration via a sidecar or config‑reloader (e.g., modify
JAVA_TOOL_OPTIONSor.runtimeconfig.jsonand trigger a rolling restart). - Validate with a canary deployment before full rollout.
This approach mirrors the self‑tuning behavior of modern databases and keeps GC settings aligned with workload changes (e.g., traffic spikes, new feature rollouts).
Performance Monitoring & Tools
JVM Tooling
| Tool | What It Shows | Typical Command |
|---|---|---|
jstat -gc | Heap occupancy, GC counts, pause times | jstat -gc pid 1000 |
jcmd <pid> GC.heap_info | Detailed heap layout | jcmd 1234 GC.heap_info |
VisualVM / JConsole | Live graphs, thread dumps | GUI |
Java Flight Recorder (JFR) | High‑resolution GC events | jfr start --duration 5m |
Enable GC logging for post‑mortem analysis:
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xlog:gc*:file=gc.log:time,uptime,level,tags:filecount=5,filesize=10M
.NET Tooling
| Tool | What It Shows | Typical Command |
|---|---|---|
dotnet-counters | Real‑time GC pause, heap size, gen 0‑2 collections | dotnet-counters monitor --process-id 5678 System.Runtime |
dotnet-trace | ETW trace with GC events | dotnet-trace collect --process-id 5678 --providers Microsoft-Windows-DotNETRuntime:0x2000:5 |
PerfView | Post‑mortem GC analysis, allocation stacks | GUI |
dotnet-gcdump | Heap snapshot for leak detection | dotnet-gcdump collect -p 5678 |
When analyzing a pause spike, look for:
- Promotion failures (
GCHeapCompactingevents) – indicates old generation pressure. - LOH fragmentation (
GCHeapCompactingwithLargeObjectHeapflag) – may requireGCSettings.LargeObjectHeapCompactionMode. - Concurrent GC aborts (
GCConcurrentAbort) – often caused by high allocation rates overwhelming the background thread.
Key Takeaways
- Generational GC separates short‑lived objects from long‑lived ones, enabling cheap minor collections and infrequent major collections.
- JVM offers multiple collectors (G1, ZGC, Shenandoah) that can be tuned with
-XX:flags; pick the one that matches your latency vs. throughput goals. - .NET CLR relies on Server vs. Workstation modes, concurrent background GC, and explicit APIs (
NoGCRegion, LOH compaction) for fine‑grained control. - Production patterns—baseline, latency‑first, throughput‑first, and adaptive loops—help you stay within SLA limits while scaling.
- Telemetry is non‑negotiable; integrate
jstat, JFR,dotnet-counters, ordotnet-traceinto your observability stack to react to GC pressure before it becomes a user‑visible outage.