TL;DR — Go’s runtime uses a three‑layer scheduler (G, M, P) plus work‑stealing queues and cooperative preemption to keep all logical processors busy. Adjusting GOMAXPROCS, avoiding blocking system calls, and profiling with runtime/trace are the most effective knobs for real‑world performance tuning.
Go’s concurrency model feels effortless: launch a goroutine with go f() and the language magically spreads work across cores. Under the hood, however, the Go runtime orchestrates a sophisticated dance between lightweight goroutine contexts (G), operating‑system threads (M), and logical processors (P). Understanding that dance lets you write faster code, avoid hidden bottlenecks, and make informed tuning decisions.
Go’s Scheduler Overview
The Go scheduler is a non‑preemptive, work‑stealing runtime that maps millions of goroutines onto a relatively small pool of OS threads. Its design balances three competing goals:
- Scalability – keep scheduling overhead near O(1) even with thousands of goroutines.
- Responsiveness – ensure that blocked or long‑running goroutine doesn’t starve others.
- Predictability – give developers deterministic control via
GOMAXPROCS.
The G‑M‑P Model
| Symbol | Meaning | Typical Count |
|---|---|---|
| G | Goroutine – the logical unit of work (stack, registers, trace info). | Potentially millions |
| M | Machine – an OS thread that actually executes G code. | ≤ runtime.NumThread() |
| P | Processor – a logical CPU slot that holds a run‑queue of Gs. | Equal to GOMAXPROCS |
A P can be thought of as a token that grants an M permission to run goroutine code. When a goroutine is ready, it is placed on the local run‑queue of the P that currently owns the M. If that queue empties, the scheduler looks at other Ps’ queues and steals work.
Note: The three‑letter model originates from the early Go paper “A Scalable Concurrency Runtime for Go” (Rob Pike, 2012).
GOMAXPROCS and Logical Processors
GOMAXPROCS tells the runtime how many Ps to create. By default it matches the number of physical CPU cores, but you can override it:
package main
import (
"fmt"
"runtime"
)
func main() {
fmt.Println("Default GOMAXPROCS:", runtime.GOMAXPROCS(0))
// Set to 2 logical processors regardless of machine.
runtime.GOMAXPROCS(2)
fmt.Println("Adjusted GOMAXPROCS:", runtime.GOMAXPROCS(0))
}
Setting GOMAXPROCS higher than the number of cores can be useful for workloads that spend a lot of time blocked in syscalls (e.g., I/O‑heavy servers). Setting it lower can reduce contention when the workload is CPU‑bound and you want to reserve cores for other processes.
Work Distribution Mechanics
The scheduler’s job is to keep each P busy. It does this through run‑queues, work stealing, and cooperative preemption.
Run Queues and Local Scheduling
Each P owns a local run‑queue (a lock‑free deque). When a goroutine becomes runnable—e.g., after a channel send or a timer fires—the runtime enqueues it onto the current P’s queue:
// Simplified pseudo‑code from runtime/sched.go
func enqueueG(g *g) {
p := getCurrentP()
p.runq = append(p.runq, g) // lock‑free push
}
Mappers (Ms) pull work from their attached P’s queue in a LIFO order, which improves cache locality for short‑lived goroutines.
Work Stealing Algorithm
If a M’s P run‑queue is empty, the scheduler attempts to steal a batch of Gs from another P’s queue. The algorithm prefers the least loaded P, but uses a random probe to avoid contention:
- Choose a random victim P.
- Atomically pop half of its run‑queue (FIFO) and push onto the thief’s local queue.
- Continue execution with the stolen Gs.
This approach guarantees that idle CPUs quickly acquire work without central coordination. The cost is bounded because stealing only occurs when a P is idle, and the batch size is tuned to amortize synchronization overhead.
Reference: The work‑stealing design mirrors the algorithm described in the original Go scheduler paper and has been refined in the runtime since Go 1.5.
Cooperative Preemption
Go introduced preemptive scheduling in Go 1.14, but it remains cooperative: the runtime inserts safe points at function calls, loop back‑edges, and allocation sites. When a goroutine runs for too long without hitting a safe point, the scheduler marks it as preempted and forces a context switch.
The preemption mechanism is implemented in the compiler, inserting a check like:
// pseudo‑assembly inserted at safe points
CALL runtime.checkpreempt
If checkpreempt sees that the current M’s P wants to run another G, it yields the CPU. This design avoids the complexity of full kernel‑level preemption while still preventing runaway goroutines from starving the system.
Tuning the Scheduler
Understanding the knobs lets you align Go’s runtime behavior with your workload.
Setting GOMAXPROCS Appropriately
For CPU‑bound workloads (e.g., heavy computation, crypto), keep GOMAXPROCS at the number of physical cores. For I/O‑bound servers that spend much of their time waiting on network or disk, you may increase it modestly:
# In a Docker container, set at runtime:
export GOMAXPROCS=$(nproc) # default
# Or bump by 20% for I/O‑heavy service:
export GOMAXPROCS=$(( $(nproc) + $(nproc) / 5 ))
Monitor CPU utilization with top or go tool pprof to verify that you’re not oversubscribing.
Avoiding Blocking System Calls
A blocking syscall (e.g., net.Dial, os.ReadFile) ties up an M, potentially reducing the number of runnable Gs. Go mitigates this by using network poller (epoll/kqueue) and non‑blocking I/O, but some libraries still call blocking C functions. To keep the scheduler happy:
- Prefer the standard library’s non‑blocking APIs.
- Use
runtime.Gosched()ortime.Sleep(0)to voluntarily yield if you must call a blocking C function. - Consider the
golang.org/x/sys/unixpackage for directpoll‑style syscalls.
Profiling with runtime/trace
The built‑in tracer visualizes P, M, and G activity over time, exposing contention points and idle periods.
go run main.go &
go tool trace trace.out
Open the generated HTML and look for:
- P idle time – indicates under‑utilization; maybe increase GOMAXPROCS.
- M blocked – shows goroutine‑level blocking; investigate syscalls or channel deadlocks.
- GC pauses – long garbage‑collection cycles can starve the scheduler; tune
GOGCif needed.
Example: Balancing a CPU‑Bound Pipeline
package main
import (
"fmt"
"runtime"
"sync"
)
func worker(id int, jobs <-chan int, results chan<- int) {
for n := range jobs {
// Simulate CPU‑heavy work.
sum := 0
for i := 0; i < n*1000; i++ {
sum += i
}
results <- sum
}
}
func main() {
runtime.GOMAXPROCS(runtime.NumCPU()) // ensure we use all cores
const numWorkers = 8
jobs := make(chan int, 100)
results := make(chan int, 100)
var wg sync.WaitGroup
wg.Add(numWorkers)
for w := 0; w < numWorkers; w++ {
go func(id int) {
defer wg.Done()
worker(id, jobs, results)
}(w)
}
// Feed jobs.
for i := 1; i <= 200; i++ {
jobs <- i
}
close(jobs)
// Wait for workers to finish then close results.
go func() {
wg.Wait()
close(results)
}()
// Collect results.
total := 0
for r := range results {
total += r
}
fmt.Println("Total:", total)
}
In this example, each worker runs on an M that holds a P. Because the work is CPU‑intensive, the scheduler will keep all Ps busy, and GOMAXPROCS determines the maximum parallelism. Adjusting numWorkers above or below GOMAXPROCS can illustrate the point of diminishing returns.
Key Takeaways
- Go’s scheduler uses the G‑M‑P model: many goroutines (G) are multiplexed onto a limited set of OS threads (M) that each hold a logical processor token (P).
- Work stealing keeps all Ps fed with work, reducing idle time without a central dispatcher.
- Cooperative preemption inserted at safe points prevents long‑running goroutines from monopolizing a P while keeping the runtime lightweight.
- Tuning
GOMAXPROCS, avoiding blocking syscalls, and usingruntime/traceare the primary levers for performance‑critical Go services. - Profiling at the scheduler level (P/M/G activity) often reveals hidden bottlenecks that traditional CPU profiling misses.
Further Reading
- The Go Scheduler: Inside the Runtime – Official Go article that explains the G‑M‑P model and work stealing.
- Runtime Scheduler Design – Go Blog (2020) – Dave Cheney’s deep dive into preemption and scheduler improvements.
- Go source code – runtime package – Browse the actual implementation of G, M, and P structures.
- Effective Go – Concurrency – Best practices for using goroutines and channels without hurting the scheduler.
- Go Performance: Profiling and Tracing – Guide to using
go tool tracefor visualizing scheduler activity