TL;DR — Building reliable asynchronous consensus for multi‑agent decision engines requires a clear separation between the consensus engine, the message bus, and the application state. By re‑using proven patterns like Raft, CRDTs, and Kafka‑based log replication, you can achieve low‑latency decisions while tolerating node failures and network partitions.
In modern data‑intensive products, a single monolithic decision service quickly becomes a bottleneck. Teams split the workload into independent agents—each responsible for a slice of the problem space—yet the agents must still agree on a shared view of the world. This post walks through the architecture, concrete implementation snippets, and production‑grade patterns that make asynchronous consensus both scalable and maintainable.
Motivation and Problem Space
- Latency vs. Consistency trade‑off – Synchronous protocols (two‑phase commit, Paxos) guarantee strong consistency but add round‑trip latency that hurts user‑facing response times.
- Dynamic topology – Agents join and leave the cluster based on load, auto‑scaling groups, or failure recovery. The consensus layer must adapt without a full restart.
- Heterogeneous workloads – Some agents run ML inference, others perform rule‑based routing. The consensus protocol must be agnostic to payload size and processing time.
These pressures push us toward asynchronous consensus: agents propose state changes, the system orders them eventually, and each node applies the ordered log at its own pace.
Core Concepts of Asynchronous Consensus
- Log replication – A durable, append‑only sequence of commands that all agents eventually consume. Kafka topics or a custom Raft log serve this purpose.
- Deterministic state machine – If every agent applies the same commands in the same order, their state converges regardless of timing differences.
- Conflict‑free replicated data types (CRDTs) – When perfect ordering is impossible, CRDTs let agents merge divergent updates without coordination.
Understanding these concepts lets us pick the right building blocks for a given SLA.
Architecture Overview
Below is a high‑level diagram (conceptual, not rendered here) of the three‑layer stack:
- Message Bus Layer – Kafka (or Pulsar) provides durable topics, partitioning, and consumer groups.
- Consensus Engine Layer – Implements Raft or a custom leader‑less log, exposing a simple
append(entry)API. - Decision Engine Layer – Stateless microservices that read the ordered log, execute business logic, and emit downstream events.
Consensus Engine Pattern
# consensus_engine.py
import json
from typing import Any, List
from kafka import KafkaProducer, KafkaConsumer
class AsyncConsensus:
def __init__(self, topic: str, bootstrap: List[str]):
self.producer = KafkaProducer(
bootstrap_servers=bootstrap,
value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
self.consumer = KafkaConsumer(
topic,
bootstrap_servers=bootstrap,
auto_offset_reset='earliest',
enable_auto_commit=False,
value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)
self.topic = topic
def propose(self, command: dict) -> None:
"""Append a command to the replicated log."""
self.producer.send(self.topic, command)
self.producer.flush()
def iterator(self):
"""Yield commands in order for the state machine."""
for msg in self.consumer:
yield msg.value
The AsyncConsensus class hides Kafka details behind a simple propose method. In production we pair it with idempotent command IDs and exactly‑once semantics using Kafka’s transactional API (see the Kafka Transactions guide).
Message Bus Integration (Kafka)
| Feature | Why it matters for consensus |
|---|---|
| Partitions | Each partition becomes an independent log; we typically use a single partition per decision engine to preserve total order. |
| Replication factor | Guarantees durability across broker failures; set to 3 for production. |
| Consumer group offset management | Allows each agent to replay from any point, useful for new nodes joining the cluster. |
When scaling horizontally, we keep the partition count low (often 1) to avoid split‑brain ordering, and we rely on leader election at the broker level rather than at the application level.
Implementation Details
State Machine Replication with Raft
For workloads that cannot tolerate eventual consistency, we embed a lightweight Raft library (e.g., hashicorp/raft) on top of the same Kafka log. The Raft leader reads from Kafka, assembles a batch, and writes a raft entry back to Kafka, guaranteeing that all followers see the same sequence.
// raft_wrapper.go (simplified)
func (r *RaftNode) Apply(log *raft.Log) interface{} {
// Decode the command stored in the Kafka‑backed log entry
var cmd Command
json.Unmarshal(log.Data, &cmd)
// Apply to the local state machine
return r.stateMachine.Apply(cmd)
}
The Raft layer handles:
- Leader election using the built‑in timeout mechanism (default 1 s).
- Log compaction via snapshotting to an S3 bucket, reducing replay time for new nodes.
CRDTs for Conflict‑Tolerant Updates
When agents generate high‑frequency, low‑impact updates (e.g., counters, sets), we replace the strict Raft log with a G‑Counter CRDT stored in Redis. Each agent independently increments its shard, and a background aggregator merges them into the global view.
# Example: Increment a G‑Counter in Redis
redis-cli HINCRBY gcounter agent_42 1
The aggregator runs every 5 seconds, reads all shards, and writes a consolidated record back to the Kafka log for downstream consumers.
Failure Detection and Timeouts
Production systems need fast failure detection. We combine:
- Kafka consumer lag metrics – Exposed via Prometheus (
kafka_consumer_lag). A lag > 5 seconds triggers a node restart. - Heartbeat topics – Each agent publishes a small “heartbeat” message every second; missing three consecutive heartbeats marks the node as suspect.
- Circuit breaker pattern – Wrap external calls (e.g., to a downstream ML model) with the
github.com/sony/gobreakerlibrary to prevent cascading failures.
breaker := gobreaker.NewCircuitBreaker(gobreaker.Settings{
Name: "MLModel",
MaxRequests: 5,
Interval: 60 * time.Second,
Timeout: 10 * time.Second,
})
Patterns in Production
- Leader‑less log for high‑throughput pipelines – Use Kafka alone when ordering is eventual; avoid Raft to reduce latency.
- Hybrid Raft + CRDT – Critical commands (e.g., policy changes) go through Raft; bulk metric updates use CRDTs.
- Zero‑downtime upgrades – Deploy a new version of the decision engine that reads the same log but writes a new version field; older instances gracefully stop proposing new commands after observing the version bump.
- Observability first – Instrument every
proposeandapplywith trace IDs (OpenTelemetry) to correlate command flow across agents. - Back‑pressure via Kafka quotas – Limit producer throughput per agent to avoid flooding the broker during spikes.
Key Takeaways
- Asynchronous consensus separates ordering (Kafka/Raft) from application (decision engine), allowing each layer to scale independently.
- Use Raft when you need strong consistency; fall back to CRDTs for high‑volume, low‑impact state.
- Keep the consensus log single‑partition per logical engine to preserve total order without complex sharding.
- Implement health checks at the message‑bus level (consumer lag, heartbeats) to detect failures faster than OS‑level timeouts.
- Instrument aggressively; a well‑instrumented system reveals the exact point where latency or inconsistency originates.
