TL;DR — Event sourcing together with CQRS lets you separate write and read concerns, store immutable facts, and rebuild state on demand. When applied correctly, this combo delivers millisecond‑level latency, linear scalability, and strong consistency across microservices.
In modern micro‑service ecosystems, the classic CRUD model often becomes a bottleneck: a single database must handle concurrent writes, complex joins, and diverse read patterns. Event sourcing flips that script by persisting facts—the events that happened—while CQRS (Command Query Responsibility Segregation) lets you build specialized read models that answer queries efficiently. This article walks you through the theory, architectural decisions, and concrete implementation tricks needed to make the pattern work at scale.
Understanding Event Sourcing
Core Concepts
Event sourcing replaces the traditional “store the current state” approach with “store every state‑changing event”. The current state is re‑computed by replaying these events in order.
| Concept | What it means | Typical storage |
|---|---|---|
| Event | An immutable fact, e.g., OrderPlaced | Append‑only log (Kafka, EventStoreDB) |
| Aggregate | Domain object that enforces invariants | Reconstructed from its event stream |
| Snapshot | Periodic state dump to avoid replaying all events | Optional, stored alongside events |
| Event Store | Write‑once, read‑many log with at‑least‑once delivery | Durable, partitioned storage |
The pattern is championed by Martin Fowler in his classic article on Event Sourcing. The key benefit is auditability: you can always reconstruct why the system is in a particular state.
Simple Python Example
Below is a minimal in‑memory event store that showcases how an aggregate can be rebuilt:
# event_store.py
from dataclasses import dataclass, asdict
from typing import List, Dict, Any
import json
import uuid
@dataclass
class Event:
id: str
type: str
payload: Dict[str, Any]
class InMemoryEventStore:
def __init__(self):
self.streams: Dict[str, List[Event]] = {}
def append(self, aggregate_id: str, event_type: str, payload: Dict[str, Any]) -> Event:
ev = Event(id=str(uuid.uuid4()), type=event_type, payload=payload)
self.streams.setdefault(aggregate_id, []).append(ev)
return ev
def load(self, aggregate_id: str) -> List[Event]:
return self.streams.get(aggregate_id, [])
# Example aggregate
class BankAccount:
def __init__(self, account_id: str, store: InMemoryEventStore):
self.id = account_id
self.store = store
self.balance = 0
self._replay()
def _apply(self, event: Event):
if event.type == "Deposited":
self.balance += event.payload["amount"]
elif event.type == "Withdrawn":
self.balance -= event.payload["amount"]
def _replay(self):
for ev in self.store.load(self.id):
self._apply(ev)
def deposit(self, amount: int):
ev = self.store.append(self.id, "Deposited", {"amount": amount})
self._apply(ev)
def withdraw(self, amount: int):
ev = self.store.append(self.id, "Withdrawn", {"amount": amount})
self._apply(ev)
Running the snippet demonstrates that the account’s balance is always derived from its event history, not from a mutable column.
Why Immutability Matters
- Concurrency safety – Multiple writers can append to the same stream without locking, because each event is independent.
- Temporal queries – You can reconstruct the state as of any point in time, useful for debugging or regulatory compliance.
- Event replay for migration – When business rules change, you can write a new projector that interprets old events differently.
CQRS Fundamentals
Separating Commands and Queries
In CQRS, commands mutate state (they produce events), while queries read from materialized views that are optimized for specific access patterns. This separation enables:
- Independent scaling – Write side can be sharded for throughput, read side can be replicated for low‑latency queries.
- Tailored data models – A read model for a dashboard may denormalize data across aggregates, while the write model remains strictly normalized.
Microsoft’s Azure Architecture guide explains the pattern in depth: CQRS pattern.
Projection Mechanics
A projector (or read‑model updater) consumes events and updates one or more query databases. Common technologies include:
- Kafka Streams – for real‑time stream processing.
- EventStoreDB subscriptions – push events directly to a handler.
- AWS Lambda + DynamoDB – serverless projection pipelines.
# Example: start a Kafka consumer that updates a MongoDB read model
docker run -d \
--name kafka-consumer \
-e KAFKA_BROKER=broker:9092 \
-e TOPIC=order-events \
-e MONGODB_URI=mongodb://mongo:27017 \
myorg/kafka-mongo-projection:latest
The consumer reads each OrderCreated, OrderPaid, etc., and writes a flattened document to MongoDB that the UI can query with a single index lookup.
Eventual vs. Strong Consistency
A common misconception is that CQRS forces eventual consistency. In fact, you can achieve strong consistency by:
- Synchronous projections – the command handler waits for the projection to succeed before acknowledging the request (acceptable for low‑latency domains).
- Read‑after‑write guarantees – using the same transaction log for both write and read (e.g., leveraging PostgreSQL’s logical replication).
- Idempotent handlers – ensuring that retries do not corrupt read models, which is essential when you need exactly‑once semantics.
Designing for High Performance
Partitioning Event Streams
When scaling to millions of events per second, a single monolithic log becomes a bottleneck. Partitioning strategies include:
- Aggregate‑based sharding – each aggregate type (e.g.,
Customer,Order) lives on a distinct partition. - Hash‑based routing – compute a hash of the aggregate ID and map it to a Kafka partition. This preserves ordering per aggregate while spreading load.
The EventStoreDB documentation recommends the latter for most micro‑service deployments.
Batching and Bulk Projections
Processing each event individually incurs overhead. Batch processing reduces I/O and CPU pressure:
# batch_projection.py
BATCH_SIZE = 500
def process_batch(events):
# Convert list of events to bulk DB operations
operations = []
for ev in events:
if ev.type == "OrderCreated":
operations.append({
"insert_one": {
"document": {
"_id": ev.payload["order_id"],
"status": "Created",
"created_at": ev.payload["timestamp"]
}
}
})
# ... other event types ...
mongo.bulk_write(operations)
# Consumer loop
batch = []
for ev in event_stream:
batch.append(ev)
if len(batch) >= BATCH_SIZE:
process_batch(batch)
batch.clear()
Batching yields a 2–3× throughput increase on typical SSD-backed storage, as shown in benchmarks from the Confluent blog (event‑sourcing performance).
Caching Read Models
Even with optimized read models, hot data can still cause latency spikes. Layered caching strategies:
- In‑process LRU caches for ultra‑fast lookup of “last‑known” values.
- Distributed caches (Redis, Memcached) for cross‑instance sharing.
- Cache‑aside pattern – populate the cache on miss, invalidate on relevant events.
A practical recipe from the “Building Scalable Systems” book suggests a write‑through approach where the projector updates both the database and the cache atomically.
Ensuring Consistency in Distributed Environments
Exactly‑Once Delivery Guarantees
When events travel across network boundaries, duplicate deliveries are inevitable. To guarantee consistency:
- Idempotent event handlers – tag each processed event ID in a deduplication store.
- Transactional outbox pattern – write events to an “outbox” table in the same DB transaction as the business data, then a separate process publishes them.
- Message brokers with EOS – Kafka’s transactional producer API provides exactly‑once semantics across partitions.
// Java snippet using Kafka transactions
producer.initTransactions();
producer.beginTransaction();
producer.send(new ProducerRecord<>("order-events", key, value));
producer.sendOffsetsToTransaction(offsets, consumerGroupId);
producer.commitTransaction();
Handling Schema Evolution
Event schemas evolve over time. To keep older events readable:
- Versioned event types – embed a
schema_versionfield. - Upcasters – functions that transform old events to the newest shape during replay.
- Schema registries – e.g., Confluent Schema Registry for Avro, ensuring producers and consumers agree on contracts.
The official Avro documentation recommends using backward‑compatible changes (adding optional fields) to avoid breaking existing consumers.
Distributed Transactions vs. Saga
Traditional distributed transactions (2‑PC) are heavyweight and often unsuitable for high‑throughput systems. The Saga pattern replaces them with a series of compensating actions.
- Choreography – each service listens for events and decides locally whether to act or compensate.
- Orchestration – a central coordinator (e.g., Temporal.io) drives the saga steps.
Temporal’s guide on sagas shows that a well‑designed saga can achieve strong consistency while preserving scalability.
Common Pitfalls and Anti‑Patterns
Storing Large Binaries in the Event Log
Events should be small, immutable facts. Store heavy payloads (images, PDFs) in an external blob store and reference them by URL.Over‑Projecting
Creating a separate read model for every query leads to maintenance churn. Start with a few generic projections and refactor as usage patterns emerge.Neglecting Snapshots
Replaying millions of events for a single aggregate becomes impractical. Implement snapshotting after a configurable number of events (e.g., every 10 000).Assuming Eventual Consistency is Free
While eventual consistency can simplify scaling, it introduces complexity in UI design (e.g., showing stale data). Use read‑after‑write paths for critical user flows.Mixing Command and Query Logic
A common anti‑pattern is to query the write model directly from the UI. This defeats the purpose of CQRS and re‑introduces coupling.
Key Takeaways
- Event sourcing stores immutable facts, enabling auditability, temporal queries, and safe concurrent writes.
- CQRS separates command handling from query serving, allowing independent scaling and specialized data models.
- Strong consistency is achievable with synchronous projections, idempotent handlers, and transactional outbox or saga patterns.
- Performance tricks such as stream partitioning, batch processing, and layered caching can push throughput into the millions of events per second.
- Guard against pitfalls: keep events lightweight, use snapshots, avoid over‑projecting, and never mix read/write concerns.