TL;DR — The MESI protocol adds four well‑defined states (Modified, Exclusive, Shared, Invalid) to each cache line, letting cores coordinate reads and writes without ever serving stale data. By broadcasting intent on a shared bus (or via a directory), MESI forces a line into the Invalid state before another core can write, guaranteeing that every core sees the most recent value.

Modern processors rely on on‑chip caches to bridge the speed gap between the CPU core and main memory. When several cores share a memory region, stale data can appear if each core keeps its own private copy without coordination. The MESI protocol—short for Modified, Exclusive, Shared, Invalid—is the industry‑standard coherence mechanism that ensures every core observes the latest value while still benefiting from fast cache hits. This article unpacks the protocol’s four states, walks through read/write scenarios, and shows concrete code examples that illustrate how stale cache states are eliminated in practice.

Background: Caches and Coherency

What is a cache line?

A cache line is the smallest unit of data that a CPU can move between memory hierarchy levels. Typical line sizes are 64 bytes on x86‑64 architectures, meaning that any load or store operation touches the whole 64‑byte block, even if the program accesses a single byte inside it. This granularity is what makes caches fast—large chunks are transferred in one bus transaction—but also what creates the potential for stale data when multiple cores hold copies of the same line.

The stale state problem

Consider two cores, C0 and C1, each with a private L1 cache. If both load the same memory address, they each receive a copy of the line in the Shared state. If C0 later writes to that address without informing C1, C1’s copy becomes outdated. If C1 subsequently reads, it may return a value that no longer reflects the most recent write, breaking program correctness. The hardware must therefore enforce a rule: before a core can write, all other copies of that line must be invalidated.

The MESI States Explained

MESI extends the simpler MSI protocol by introducing the Exclusive state, which reduces unnecessary bus traffic when a line is known to be owned by a single core.

StateMeaningTypical Transitions
Modified (M)The line is dirty (different from memory) and owned exclusively by the core.M → I (write‑back), M → S (snoop read)
Exclusive (E)The line is clean, present only in this core, and matches memory.E → M (write), E → I (snoop read)
Shared (S)The line is clean and may exist in multiple cores.S → M (upgrade), S → I (snoop write)
Invalid (I)The line is not valid; any access triggers a miss.I → E/M/S (on miss)

The addition of E allows a core to read a line that it knows no other core has cached, avoiding the broadcast that would otherwise be required to claim ownership.

How MESI Handles Reads and Writes

Read miss workflow

  1. Core issues a read request on the memory bus for address A.
  2. Snoop logic on all other cores checks whether they hold A in M or E.
    • If another core holds A in M, it supplies the data (write‑back) and downgrades to S.
    • If no core has A in M/E, the request goes to main memory.
  3. The requesting core receives the line:
    • If no other core held it, it enters E.
    • If at least one other core held it (now in S), it also enters S.

Write miss workflow

  1. Core issues a write request for address B that it does not have in M/E.
  2. Bus broadcast (or directory message) tells all other cores to invalidate B.
  3. All cores with B in S or E transition to I.
  4. The requesting core receives the line (if it wasn’t already present) in E, then upgrades to M because it will modify it.

Upgrade from S to M

If a core already holds a line in S and wants to write, it performs a write‑upgrade transaction:

  • It broadcasts an Upgrade signal.
  • All other cores with the line in S transition to I.
  • The requesting core switches its state from S to M without pulling the data from memory again.

This upgrade is cheaper than a full write miss because the data is already present locally.

Inter‑processor Communication Mechanisms

Bus snooping

Traditional multicore CPUs use a shared coherence bus (e.g., Intel’s QPI or AMD’s Infinity Fabric) where every cache controller snoops every transaction. When a core issues a read or write, the snoop logic on other cores can react instantly, enforcing the MESI state machine. The bus model is simple to understand and works well for a modest number of cores.

Directory‑based systems

At larger scales (many‑core or NUMA systems), a full broadcast bus becomes a bottleneck. Instead, a directory tracks which cores hold each line. When a core wants to write, the directory sends invalidation messages only to the owners. The underlying state machine remains MESI‑compatible, but the transport layer changes. For a concise overview, see the Intel® 64 and IA‑32 Architectures Software Developer’s Manual, Volume 3, Chapter 8 Intel Docs.

Resolving Stale Cache States in Practice

Example scenario: two cores sharing a variable

// shared.c
#include <stdio.h>
#include <pthread.h>

int counter = 0;          // resides in memory, cache line L
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void *worker(void *arg) {
    for (int i = 0; i < 1e6; ++i) {
        pthread_mutex_lock(&lock);
        counter++;       // write to L
        pthread_mutex_unlock(&lock);
    }
    return NULL;
}

In this program, each thread runs on a different core. The counter variable lives in a single cache line L. Without proper synchronization, a core could read a stale copy of counter. The mutex implementation relies on MESI to enforce coherence:

  1. Lock acquisition triggers an atomic bus lock that forces any other core holding L in S or E to transition to I.
  2. The locking core now has L in M (after writing) or E (if it only reads).
  3. Unlock releases the line, allowing other cores to fetch the updated value in S or E.

Because the lock’s implementation uses compare‑and‑swap (CAS) instructions that issue a write‑broadcast, MESI guarantees that no core can see a stale counter after the lock is released.

Step‑by‑step state transitions (ASCII diagram)

Core0                Core1                Memory
------               ------               ------
I                    I                    L (value = 0)

Core0 reads L  -->  E (clean)            L (value = 0)
Core1 reads L  -->  S (shared)  <---    S (shared)

Core0 writes L -->  M (dirty)            I
                (invalidates Core1's S)

Core0 unlock   -->  E (clean)            I
                (writes back to memory)

Core1 reads L  -->  S (shared)            L (value = 1)

Notice how the Invalid state eliminates the stale copy on Core1 before Core0’s write becomes visible.

Code sample illustrating a race condition without MESI enforcement

# race.py – deliberately unsynchronized increment
import threading

counter = 0  # shared variable (assume each thread runs on a different core)

def inc():
    global counter
    for _ in range(10_000):
        tmp = counter      # load – may read stale cache line
        tmp += 1
        counter = tmp      # store – may write without invalidating others

t1 = threading.Thread(target=inc)
t2 = threading.Thread(target=inc)
t1.start(); t2.start()
t1.join(); t2.join()
print(counter)

On a real MESI‑enabled CPU, the load and store each cause bus traffic that enforces coherence, so the final value will still be 20 000 (assuming the Python interpreter issues the necessary memory fences). However, if the architecture lacked a coherence protocol, the two cores could each keep a private copy of counter, leading to a final result far below the expected total—a classic stale‑cache bug.

Performance Implications

Latency vs. bandwidth

MESI’s state transitions cost cycles:

TransitionApprox. latency (cycles)Typical impact
I → E/M30‑50 (memory fetch)Miss penalty
S → M (upgrade)5‑10 (bus broadcast)Small but adds contention
M → I (write‑back)20‑30 (write to memory)Affects write‑heavy workloads

Because the E state eliminates the need for a broadcast when a line is uniquely owned, workloads with high read‑only sharing (e.g., read‑only data tables) benefit significantly.

False sharing mitigation

False sharing occurs when two unrelated variables reside on the same cache line, causing unnecessary invalidations. MESI will still enforce coherence, but the frequent transitions (S ↔ M ↔ I) waste bandwidth and raise latency. Padding structures to 64‑byte boundaries reduces false sharing, allowing each variable to stay in its own line and stay in E or S without constant upgrades.

// Padding to avoid false sharing
struct alignas(64) Counter {
    int value;
};

struct Counter counters[2]; // each counter lives on a separate line

By aligning each Counter to the cache line size, each core can modify its own counter without causing invalidations on the other core’s line, letting the protocol stay in the cheap EM path.

Key Takeaways

  • MESI adds four explicit states (Modified, Exclusive, Shared, Invalid) that let cores coordinate reads and writes without serving stale data.
  • Read misses may result in E or S depending on whether other cores hold the line; write misses always broadcast invalidations to achieve exclusive ownership.
  • Write‑upgrade (S→M) is cheaper than a full write miss because the data is already cached locally.
  • Bus snooping and directory‑based mechanisms implement the same state machine at different scales; both guarantee that no two cores can hold a dirty copy simultaneously.
  • Performance hinges on minimizing unnecessary state transitions; using the Exclusive state and avoiding false sharing keep latency low and bandwidth high.

Further Reading