Implementing Conflict-Free Replicated Data Types for Eventual Consistency in Distributed Systems

TL;DR — CRDTs let you build distributed services that converge automatically without coordination. By choosing the right CRDT type, encoding state efficiently, and testing conflict‑free behavior, you can achieve strong eventual consistency with minimal latency.

Distributed systems must reconcile updates that arrive out of order, from different network partitions, or after node failures. Traditional approaches rely on heavyweight coordination (e.g., Paxos, Raft) that adds latency and reduces availability. Conflict‑Free Replicated Data Types (CRDTs) offer a complementary strategy: each replica can apply operations locally, and the mathematical properties of the data type guarantee that all replicas converge to the same state once they have seen the same set of updates. This article walks through the theory behind CRDTs, examines the most common concrete types, and shows how to implement a small CRDT library that can be dropped into a production‑grade distributed service.

What Are CRDTs?

A Conflict‑Free Replicated Data Type is a data structure whose replicas can be updated independently and merged deterministically without requiring any central arbitrator. The key guarantees are:

Commutativity – Merges can be applied in any order and yield the same result.
Idempotence – Re‑applying the same update does not change the state after the first application.
Associativity – Grouping of merge operations does not affect the final state.

Because these algebraic properties hold, the system achieves strong eventual consistency (SEC): if no new updates are made, all replicas eventually converge to a common state.

Two families of CRDTs exist:

State‑based (or Convergent) CRDTs – Replicas periodically exchange their entire state and merge using a join‑semilattice operation.
Operation‑based (or Commutative) CRDTs – Replicas broadcast individual operations that are designed to commute.

Both families are mathematically equivalent, but they differ in network bandwidth, latency, and implementation complexity.

Types of CRDTs

Family	Example Types	Typical Use‑Case
State‑based	G‑Counter, PN‑Counter, G‑Set, OR‑Set, LWW‑Register	Simple counters, sets, or registers where full state can be shipped efficiently.
Operation‑based	Add‑Wins Set, Remove‑Wins Set, Increment‑Only Counter	High‑frequency updates where sending only the operation reduces bandwidth.

For most key‑value stores, a state‑based G‑Set (grow‑only set) or PN‑Counter (positive‑negative counter) is sufficient. More complex collaborative editing tools often rely on OR‑Set (observed‑remove set) or RGA (replicated growable array) to model ordered sequences.

Core Algorithms

G‑Counter (Grow‑Only Counter)

A G‑Counter is a vector of integers, one entry per replica. To increment, a replica updates its own entry. Merge is defined as the element‑wise maximum.

class GCounter:
    def __init__(self, replica_id):
        self.id = replica_id
        self.state = {}

    def increment(self, n=1):
        self.state[self.id] = self.state.get(self.id, 0) + n

    def merge(self, other):
        for rid, val in other.state.items():
            self.state[rid] = max(self.state.get(rid, 0), val)

    def value(self):
        return sum(self.state.values())

Properties: commutative (max), associative, idempotent.

PN‑Counter (Positive‑Negative Counter)

A PN‑Counter composes two G‑Counters: one for increments (P) and one for decrements (N). The observable value is P - N.

class PNCounter:
    def __init__(self, replica_id):
        self.inc = GCounter(replica_id)
        self.dec = GCounter(replica_id)

    def increment(self, n=1):
        self.inc.increment(n)

    def decrement(self, n=1):
        self.dec.increment(n)

    def merge(self, other):
        self.inc.merge(other.inc)
        self.dec.merge(other.dec)

    def value(self):
        return self.inc.value() - self.dec.value()

G‑Set (Grow‑Only Set)

A G‑Set stores a plain Python set. Merge is the union of the two sets.

class GSet:
    def __init__(self):
        self.elements = set()

    def add(self, element):
        self.elements.add(element)

    def merge(self, other):
        self.elements |= other.elements

OR‑Set (Observed‑Remove Set)

An OR‑Set tracks each addition with a unique tag (often a tuple of replica ID and a monotonically increasing counter). Removal records the tags it has seen and discards matching additions during merge.

class ORSet:
    def __init__(self, replica_id):
        self.id = replica_id
        self.adds = {}          # element -> set of tags
        self.removes = set()    # set of tags

    def _next_tag(self):
        # Simple per‑replica counter
        self._counter = getattr(self, "_counter", 0) + 1
        return (self.id, self._counter)

    def add(self, element):
        tag = self._next_tag()
        self.adds.setdefault(element, set()).add(tag)

    def remove(self, element):
        tags = self.adds.get(element, set())
        self.removes.update(tags)

    def merge(self, other):
        # Merge adds
        for elem, tags in other.adds.items():
            self.adds.setdefault(elem, set()).update(tags)
        # Merge removes
        self.removes.update(other.removes)

    def elements(self):
        result = set()
        for elem, tags in self.adds.items():
            if not tags.issubset(self.removes):
                result.add(elem)
        return result

Key insight: By tracking causality via tags, OR‑Set resolves concurrent add/remove conflicts deterministically (add‑wins by default).

Building a CRDT Library

When turning the sketches above into a production‑ready library, consider the following cross‑cutting concerns.

Language Choice and Serialization

Language – Choose a language with strong typing and efficient serialization (e.g., Go, Rust, or Java). These languages make it easier to guarantee that merge operations are pure functions, a prerequisite for SEC.
Serialization – CRDT state must be transmitted over the wire. Protocol Buffers, Apache Avro, or CBOR are common choices because they support deterministic binary encoding. Determinism is crucial: two replicas must interpret the same byte stream as the same logical state.

// Example protobuf definition for a PN‑Counter
syntax = "proto3";

message GCounter {
  map<string, uint64> entries = 1;
}

message PNCounter {
  GCounter inc = 1;
  GCounter dec = 2;
}

Version Vectors and Causality

Even state‑based CRDTs benefit from version vectors (also called vector clocks) to detect stale merges and prune tombstones. A version vector records, for each replica, the highest sequence number seen. When merging:

def merge_with_vv(local, remote, local_vv, remote_vv):
    # Merge state first
    local.merge(remote)
    # Update version vector
    for rid, seq in remote_vv.items():
        local_vv[rid] = max(local_vv.get(rid, 0), seq)

Version vectors also enable garbage collection of tombstones in OR‑Sets: once every replica’s version vector has advanced past a removal tag, the tag can be safely discarded.

Example: A Simple Grow‑Only Counter in Python

Below is a complete, testable snippet that demonstrates a G‑Counter with protobuf serialization.

# gcounter.proto
# syntax = "proto3";
# message GCounter {
#   map<string, uint64> entries = 1;
# }

import gcounter_pb2
from copy import deepcopy

class GCounter:
    def __init__(self, replica_id):
        self.id = replica_id
        self.state = gcounter_pb2.GCounter()
        self.state.entries[self.id] = 0

    def increment(self, n=1):
        self.state.entries[self.id] += n

    def merge(self, other_bytes):
        other = gcounter_pb2.GCounter()
        other.ParseFromString(other_bytes)
        for rid, val in other.entries.items():
            self.state.entries[rid] = max(self.state.entries.get(rid, 0), val)

    def serialize(self):
        return self.state.SerializeToString()

    def value(self):
        return sum(self.state.entries.values())

# Demo
c1 = GCounter("nodeA")
c2 = GCounter("nodeB")
c1.increment(5)
c2.increment(3)

# Exchange states
c1.merge(c2.serialize())
c2.merge(c1.serialize())

assert c1.value() == 8 and c2.value() == 8

Running this script prints nothing, but the assertions confirm that both replicas converge to the same total after a single round‑trip.

Integrating CRDTs into Distributed Systems

Replication Topology

Full Mesh – Every replica gossips its state to all others. Simpler to reason about but scales poorly (O(N²) messages).
Ring / Hierarchical – Nodes forward updates along a directed path. Reduces bandwidth but introduces a bounded propagation delay.
Hybrid – Combine local full‑mesh within a data center and inter‑data‑center ring for wide‑area replication.

The topology determines how often you need to send full state (state‑based) versus operations (op‑based). In a high‑throughput chat application, operation‑based CRDTs for presence lists are preferable because each presence event is tiny.

Conflict Resolution Guarantees

Because CRDTs guarantee convergence by design, you can ignore most conflict‑resolution code that traditional eventually‑consistent stores require. However, you still need to decide semantic conflict policies for the domain:

Add‑Wins – Prefer additions (e.g., collaborative document editing where a newly typed character should survive).
Remove‑Wins – Prefer deletions (e.g., feature flags where a “disable” should dominate).
Last‑Write‑Wins (LWW) – Use timestamps to pick the most recent update (common for registers).

Pick the policy early; it dictates which concrete CRDT variant you implement.

Using CRDTs with Existing Databases

Many NoSQL stores already expose CRDT‑like primitives:

Riak KV – Offers built‑in CRDT types (counters, sets, maps) that can be accessed via its HTTP API. See the official guide for usage patterns.
Redis – Since v7.0, Redis Enterprise supports CRDTs for active‑active replication. The CRDT module provides CRDT.SET, CRDT.COUNTER, etc.
Cassandra – While not offering native CRDTs, you can model LWW registers using lightweight timestamps and combine them with application‑level merge logic.

When integrating, treat the database as a persistence layer for your CRDT objects. Serialize the CRDT state, store it under a stable key, and rely on the database’s replication to move the bytes between nodes. The merge logic stays in your application code, guaranteeing that the convergence property is not lost even if the underlying store changes.

Testing and Validation

Property‑Based Testing

CRDT correctness is naturally expressed as properties rather than fixed examples. Tools like Hypothesis (Python), QuickCheck (Haskell), or proptest (Rust) can generate random sequences of operations and verify convergence.

from hypothesis import given, strategies as st

@given(st.lists(st.tuples(st.sampled_from(['inc', 'dec']), st.integers(min_value=1, max_value=5)), min_size=1, max_size=20))
def test_pn_counter_convergence(seq):
    a = PNCounter('A')
    b = PNCounter('B')
    for op, n in seq:
        getattr(a, op)(n)
        getattr(b, op)(n)  # apply same ops in same order locally
    # Simulate out‑of‑order delivery
    a.merge(b)
    b.merge(a)
    assert a.value() == b.value()

Running the test thousands of times gives high confidence that merge satisfies commutativity, associativity, and idempotence.

Simulation Frameworks

For large clusters, Jepsen (https://jepsen.io) provides a systematic way to inject network partitions, drop messages, and then verify that the system converges. While Jepsen is often used for consensus algorithms, its CRDT model can be customized to validate your own library under realistic failure scenarios.

Key Takeaways

CRDTs give you strong eventual consistency without coordinating every write, dramatically improving latency and availability.
Choose between state‑based (full‑state exchange) and operation‑based (broadcast ops) depending on bandwidth constraints and update frequency.
Common building blocks—G‑Counter, PN‑Counter, G‑Set, OR‑Set—cover most application needs; more exotic structures (RGA, LWW‑Map) are extensions of these primitives.
A production‑grade library should include deterministic serialization, version vectors, and garbage‑collection of tombstones.
Integration patterns vary: embed CRDTs in key‑value stores (Riak, Redis), or use them as an in‑memory layer with periodic persistence.
Validate rigorously with property‑based tests and failure‑injection tools like Jepsen to ensure the merge function truly satisfies the CRDT algebra.

What Are CRDTs?#

Types of CRDTs#

Core Algorithms#

G‑Counter (Grow‑Only Counter)#

PN‑Counter (Positive‑Negative Counter)#

G‑Set (Grow‑Only Set)#

OR‑Set (Observed‑Remove Set)#

Building a CRDT Library#

Language Choice and Serialization#

Version Vectors and Causality#

Example: A Simple Grow‑Only Counter in Python#

Integrating CRDTs into Distributed Systems#

Replication Topology#

Conflict Resolution Guarantees#

Using CRDTs with Existing Databases#

Testing and Validation#

Property‑Based Testing#

Simulation Frameworks#

Key Takeaways#

Further Reading#