TL;DR — Generational garbage collection splits the heap into young and old regions, enabling most objects to be reclaimed quickly. Both the HotSpot JVM and the .NET CLR expose well‑documented tuning knobs that let production teams balance pause‑time, throughput, and memory footprint.

Modern back‑end services run for months, handling millions of requests per second while staying within tight latency SLAs. Under that pressure, a pause‑heavy GC can become a silent outage. This post unpacks the internals of generational GC in the two most widely deployed managed runtimes—OpenJDK/HotSpot and Microsoft’s .NET CLR—then translates those internals into concrete patterns you can apply today.

Generational GC Fundamentals

Generational GC is built on a simple statistical observation: most objects die young. By allocating short‑lived objects in a young generation and promoting survivors to an old generation, the collector can run frequent, cheap collections on the young space while only occasionally scanning the larger, long‑lived old space.

Young Generation Mechanics

The young generation is typically divided into:

Sub‑regionPurposeTypical Size
EdenPrimary allocation point for new objects60–80 % of young space
Survivor 0 (S0)Holds objects that survived the most recent minor GC10–20 %
Survivor 1 (S1)Alternates with S0 on each minor GC10–20 %

When the Eden space fills, a minor GC copies live objects to the survivor spaces, then promotes any that have survived a configurable number of minor collections (the tenuring threshold) to the old generation. The copy‑on‑write nature of this algorithm gives it O(N) cost where N is the number of live objects, not the total allocated size.

In HotSpot, the default tenuring threshold is 15 minor GCs, but you can override it with -XX:MaxTenuringThreshold=<n> (see the JVM Tuning section). .NET’s GC uses a similar concept called generation 0, with promotion governed by the GCHeapHardLimit and GCHeapCount settings.

Old Generation & Promotion

The old generation (also called tenured or gen2) holds objects that have survived enough young‑generation cycles. Because it is larger and less frequently collected, its algorithm is more sophisticated:

  • HotSpot offers several collectors for the old generation—Parallel Scavenge, G1, ZGC, Shenandoah—each with distinct pause‑time vs. throughput trade‑offs.
  • .NET provides three modes: Workstation, Server, and Background GC. Server mode creates a dedicated background thread that performs concurrent collections, reducing pause times at the cost of extra CPU.

Promotion is not free: moving an object from young to old incurs a copy and, if the old generation is already near capacity, can trigger a full GC (also called a major or mixed GC). Understanding when promotion happens is key to avoiding “promotion‑induced” latency spikes.

Architecture in the JVM

The HotSpot JVM’s GC architecture is modular. At a high level:

  1. Young collector (e.g., Parallel Scavenge or G1 Young) handles minor collections.
  2. Old collector (e.g., Parallel Old, G1 Mixed, ZGC) manages major collections.
  3. Coordinator (-XX:+UseStringDeduplication, -XX:+UseCompressedOops) decides when to trigger mixed collections based on heap occupancy thresholds.

G1 GC: A Production Favorite

The Garbage‑First (G1) collector was introduced in Java 7 as a replacement for CMS. It partitions the heap into regions (typically 1–32 MiB each) and performs incremental evacuation of regions based on a pause‑time goal (-XX:MaxGCPauseMillis). G1’s mixed collections blend young and old region evacuation, giving you fine‑grained control over latency.

# Example JVM flags for a low‑latency G1 setup
java -Xms4g -Xmx4g \
     -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=50 \
     -XX:InitiatingHeapOccupancyPercent=45 \
     -XX:ConcGCThreads=4 \
     -XX:ParallelGCThreads=8 \
     -jar myapp.jar
  • -XX:MaxGCPauseMillis=50 tells G1 to aim for pauses under 50 ms.
  • -XX:InitiatingHeapOccupancyPercent=45 triggers the first concurrent cycle when the heap reaches 45 % usage, smoothing out the transition to mixed mode.

ZGC & Shenandoah: Near‑Zero Pauses

For workloads that cannot tolerate even a 10 ms pause, ZGC (JDK 11+) and Shenandoah (OpenJDK) offer region‑based concurrent collection. Both move objects using load‑linked/store‑conditional techniques, allowing the mutator to continue while the collector works.

# ZGC with heap limit and aggressive thread usage
java -Xmx8g -Xms8g -XX:+UseZGC -XX:ConcGCThreads=6 -XX:ParallelGCThreads=12 -jar service.jar

Key takeaways:

  • ZGC scales linearly with heap size, making it suitable for >100 GiB heaps.
  • Shenandoah shines on low‑core, high‑latency environments (e.g., container‑orchestrated microservices).

Architecture in the .NET CLR

The .NET runtime’s GC is built around generations and concurrent collection. Since .NET 5, the GC has been unified across Windows, Linux, and macOS, with the same configuration knobs.

Server vs. Workstation GC

  • Workstation GC (default for desktop apps) uses a single background thread, optimizing for low CPU usage.
  • Server GC (default for ASP.NET Core on Windows) creates one GC thread per logical CPU, maximizing throughput.

You select the mode via the System.GCSettings.IsServerGC property or the <gcServer> element in the runtime config.

<!-- .NET runtimeconfig.json snippet -->
{
  "runtimeOptions": {
    "configProperties": {
      "System.GC.Server": true,
      "System.GC.Concurrent": true,
      "System.GC.RetainVM": false
    }
  }
}
  • System.GC.Server=true enables server mode.
  • System.GC.Concurrent=true turns on background (concurrent) collections.

Large Object Heap (LOH) and Pinned Objects

Objects > 85 KiB go directly to the Large Object Heap (LOH), which is not compacted by default. This can cause fragmentation. Starting with .NET 5, you can request LOH compaction on the next full GC:

// Force LOH compaction in .NET 6+
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();

Pinned objects (e.g., fixed buffers, interop scenarios) also inhibit compaction. The CLR tracks pinning and reports it in dotnet-counters and ETW events.

Patterns in Production

Real‑world services rarely rely on a single GC setting. Instead, engineers adopt patterns that combine monitoring, incremental tuning, and fallback strategies.

1. Baseline “No‑Tuning” Deployment

Start with the runtime defaults:

  • HotSpot: -Xmx/-Xms set to the same value, -XX:+UseG1GC.
  • .NET: Server GC enabled for ASP.NET Core, default pause‑time goals.

Run a steady‑state load test for at least 2 × the expected production traffic duration (e.g., 2 hours) and collect:

  • GC pause histograms (-Xlog:gc for JVM, dotnet-trace for .NET).
  • Heap occupancy over time.
  • CPU utilization.

If pauses stay under your SLA (e.g., 100 ms), you may not need further tweaks.

2. “Latency‑First” Tuning

When latency spikes are unacceptable:

RuntimeKey Flags / SettingsTypical Values
HotSpot (G1)-XX:MaxGCPauseMillis20–50 ms
HotSpot (ZGC)-XX:ConcGCThreads4–8
.NETSystem.GC.Concurrenttrue (already default)
.NETSystem.GC.Servertrue (for multi‑core)
.NETGCSettings.LatencyModeLowLatency during critical sections

Example: In a payment‑processing service, wrap critical sections in GC.TryStartNoGCRegion to temporarily suspend collections:

if (GC.TryStartNoGCRegion(1024 * 1024 * 100)) // 100 MiB budget
{
    // Critical path – no GC pauses here
    ProcessPayment(request);
    GC.EndNoGCRegion();
}

Remember that NoGCRegion is a best‑effort API; the runtime may abort it if memory pressure rises.

3. “Throughput‑First” Scaling

For batch jobs or analytics pipelines where raw throughput matters more than pause latency:

  • Increase -XX:ParallelGCThreads (JVM) or System.GC.HeapCount (.NET) to match CPU cores.
  • Raise the young generation size (-Xmn or -XX:NewSize/-XX:MaxNewSize) to reduce frequency of minor GCs.
  • Disable explicit GC calls (System.GC.Collect() in .NET) unless you have a proven reason.

4. “Hybrid” Adaptive Loop

A production pattern that many cloud‑native teams adopt is a feedback loop driven by telemetry:

  1. Collect GC metrics every minute (dotnet-counters, jstat, Prometheus exporters).
  2. Analyze whether pause‑time percentiles exceed thresholds.
  3. Adjust configuration via a sidecar or config‑reloader (e.g., modify JAVA_TOOL_OPTIONS or .runtimeconfig.json and trigger a rolling restart).
  4. Validate with a canary deployment before full rollout.

This approach mirrors the self‑tuning behavior of modern databases and keeps GC settings aligned with workload changes (e.g., traffic spikes, new feature rollouts).

Performance Monitoring & Tools

JVM Tooling

ToolWhat It ShowsTypical Command
jstat -gcHeap occupancy, GC counts, pause timesjstat -gc pid 1000
jcmd <pid> GC.heap_infoDetailed heap layoutjcmd 1234 GC.heap_info
VisualVM / JConsoleLive graphs, thread dumpsGUI
Java Flight Recorder (JFR)High‑resolution GC eventsjfr start --duration 5m

Enable GC logging for post‑mortem analysis:

-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xlog:gc*:file=gc.log:time,uptime,level,tags:filecount=5,filesize=10M

.NET Tooling

ToolWhat It ShowsTypical Command
dotnet-countersReal‑time GC pause, heap size, gen 0‑2 collectionsdotnet-counters monitor --process-id 5678 System.Runtime
dotnet-traceETW trace with GC eventsdotnet-trace collect --process-id 5678 --providers Microsoft-Windows-DotNETRuntime:0x2000:5
PerfViewPost‑mortem GC analysis, allocation stacksGUI
dotnet-gcdumpHeap snapshot for leak detectiondotnet-gcdump collect -p 5678

When analyzing a pause spike, look for:

  • Promotion failures (GCHeapCompacting events) – indicates old generation pressure.
  • LOH fragmentation (GCHeapCompacting with LargeObjectHeap flag) – may require GCSettings.LargeObjectHeapCompactionMode.
  • Concurrent GC aborts (GCConcurrentAbort) – often caused by high allocation rates overwhelming the background thread.

Key Takeaways

  • Generational GC separates short‑lived objects from long‑lived ones, enabling cheap minor collections and infrequent major collections.
  • JVM offers multiple collectors (G1, ZGC, Shenandoah) that can be tuned with -XX: flags; pick the one that matches your latency vs. throughput goals.
  • .NET CLR relies on Server vs. Workstation modes, concurrent background GC, and explicit APIs (NoGCRegion, LOH compaction) for fine‑grained control.
  • Production patterns—baseline, latency‑first, throughput‑first, and adaptive loops—help you stay within SLA limits while scaling.
  • Telemetry is non‑negotiable; integrate jstat, JFR, dotnet-counters, or dotnet-trace into your observability stack to react to GC pressure before it becomes a user‑visible outage.

Further Reading