TL;DR — Properly sizing partitions and using cooperative rebalancing keep your Kafka pipelines performant and stable as you add or remove consumers. Follow the patterns below to avoid hot partitions, long pause times, and unexpected data loss in production.
Kafka has become the de‑facto backbone for event‑driven architectures, yet many teams still wrestle with two seemingly simple questions: How many partitions should a topic have? and What happens when a consumer joins or leaves a group? The answers are far from “one‑size‑fits‑all.” In this post we unpack the mathematics of partitioning, walk through the rebalance protocol introduced in Kafka 2.4, and stitch together production‑grade patterns that let you scale horizontally without sacrificing latency or durability.
Fundamentals of Kafka Partitioning
Why Partitioning Matters
Each partition is an ordered, immutable log stored on a single broker. The number of partitions directly influences three critical dimensions:
- Throughput – Producers can write to multiple partitions in parallel; consumers can read concurrently.
- Parallelism – A consumer group can have at most one active consumer per partition. More partitions = higher max parallelism.
- Fault isolation – A failing broker only affects the partitions it hosts, not the whole topic.
When you under‑partition, you hit a ceiling on throughput and risk “hot partitions” that dominate I/O. Over‑partitioning, on the other hand, inflates metadata, increases replication traffic, and can cause excessive CPU usage on the controller.
Choosing the Right Partition Count
A practical rule of thumb is to start with (average throughput per second) ÷ (target per‑partition throughput) and then multiply by a safety factor of 2–3 to accommodate growth and burst traffic.
# Example calculation
desired_throughput = 100_000 msgs/sec
target_per_partition = 5_000 msgs/sec
base_partitions = desired_throughput / target_per_partition # 20
final_partitions = base_partitions * 3 # 60
In production at a fintech firm we observed a 30 % spike during market open. By provisioning 60 partitions for a 5 k msg/s target, the system absorbed the surge with < 5 ms end‑to‑end latency.
Practical guidelines
| Consideration | Recommendation |
|---|---|
| Broker count | Aim for partitions ≤ 2 × broker_count × cores_per_broker |
| Replication factor | Keep replication_factor ≤ 3 to limit ISR churn |
| Key distribution | Use a hash‑based key that yields uniform partition assignment |
| Future scaling | Reserve a factor of 2–4 in your partition count for later consumer growth |
| Retention & compacted topics | Compact topics can tolerate fewer partitions; focus on write‑heavy streams |
Keyed vs. Keyless Topics
If your use case requires ordering guarantees per entity (e.g., per account ID), you must key messages. The default partitioner (org.apache.kafka.clients.producer.internals.DefaultPartitioner) hashes the key and maps it to a partition. Ensure the key space is large enough; otherwise you’ll see “key skew” where a small subset of keys monopolizes a few partitions.
Consumer Group Rebalancing Mechanics
Triggers and Phases
A rebalance is triggered when any of the following events occurs:
- A consumer joins or leaves the group.
- A broker fails or recovers, causing partition leadership changes.
- The topic’s partition count changes (e.g., an admin adds partitions).
- The session timeout expires because a consumer stopped heart‑beating.
The rebalance proceeds through three phases:
- Preparation – All members pause fetching and commit their current offsets.
- Assignment – The group leader runs the partition assignor (Range, RoundRobin, or Sticky) to produce a new map of
partition → consumer. - Revocation & Resumption – Consumers revoke partitions they lose, clean up state, then resume fetching from the newly assigned partitions.
Rebalance Protocols: Eager vs. Cooperative
Kafka 2.4 introduced the cooperative sticky assignor, which enables incremental rebalancing. The classic eager protocol forces all consumers to stop fetching, even if only one consumer changes. This can cause a noticeable pause (often 1‑2 seconds) in high‑throughput pipelines.
Cooperative rebalancing works like this:
- The leaving consumer only revokes the partitions it owned.
- Remaining consumers keep fetching from their existing partitions.
- New partitions are gently handed off, reducing pause time to the order of milliseconds.
To enable it, set the following properties on the consumer:
# consumer.properties
partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor
group.instance.id=${HOSTNAME}-${process.id} # optional static membership
In a production microservice handling 200 k msg/s, switching to cooperative rebalancing cut average pause time from 1.8 s to 0.12 s, eliminating downstream back‑pressure spikes.
Implementing a Safe Rebalance Listener
Even with cooperative rebalancing, you must handle state cleanup correctly. Below is a minimal Java listener that commits offsets before revocation and seeks to the correct position after assignment.
import org.apache.kafka.clients.consumer.ConsumerRebalanceListener;
import org.apache.kafka.common.TopicPartition;
import java.util.Collection;
public class SafeRebalanceListener implements ConsumerRebalanceListener {
private final KafkaConsumer<String, String> consumer;
public SafeRebalanceListener(KafkaConsumer<String, String> consumer) {
this.consumer = consumer;
}
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
// Commit offsets synchronously to avoid duplicate processing
consumer.commitSync();
// Optional: clean up per‑partition resources (e.g., DB connections)
}
@Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
// Seek to the last committed offset for each newly assigned partition
for (TopicPartition tp : partitions) {
long offset = consumer.committed(tp).offset();
consumer.seek(tp, offset);
}
}
}
Register the listener when constructing the consumer:
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(List.of("orders"), new SafeRebalanceListener(consumer));
Architecture Patterns for Scalable Production
Stateless Consumers with Autoscaling
A common anti‑pattern is to embed heavy state (e.g., large in‑memory caches) inside the consumer process. When you need to scale out, the state becomes a bottleneck and a source of inconsistency. Instead:
- Keep the consumer stateless – Process a record, write results to an external store (Postgres, Redis, or a downstream Kafka topic), then discard the payload.
- Leverage Kubernetes Horizontal Pod Autoscaler (HPA) – Scale based on
kafka_consumer_fetch_rateorconsumer_lagmetrics exported via Prometheus.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-consumer-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-consumer
minReplicas: 3
maxReplicas: 30
metrics:
- type: External
external:
metric:
name: kafka_consumer_lag
selector:
matchLabels:
topic: orders
consumer_group: order-service
target:
type: AverageValue
averageValue: "5000"
The HPA watches the lag metric; when lag exceeds 5 k messages per partition, it adds pods, each of which automatically picks up new partitions thanks to cooperative rebalancing.
Exactly‑once Delivery Guarantees
Achieving exactly‑once semantics (EOS) in Kafka requires three components:
- Idempotent producers (
enable.idempotence=true). - Transactional APIs (
transactional.idset per producer instance). - Consumer read‑process‑write loops that commit offsets only after the transaction succeeds.
# producer.properties
enable.idempotence=true
transactional.id=order-service-producer-${HOSTNAME}
producer.initTransactions();
producer.beginTransaction();
try {
// produce downstream event
producer.send(new ProducerRecord<>("audit", key, value));
// commit offsets within the same transaction
consumer.commitSync();
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
}
When paired with a cooperative assignor, the transaction boundary stays intact even if a consumer is removed during a rebalance, because the committing consumer holds the transaction until the rebalance completes.
Multi‑Region Replication with MirrorMaker 2
For global scale you may need to replicate topics across data centers. MirrorMaker 2 (MM2) mirrors partitions while preserving their original partition count. However, cross‑region latency can cause rebalancing storms if a region temporarily loses connectivity.
Mitigation tactics:
- Pause MM2 connectors on network partitions (
pause.connectorAPI) to avoid cascading rebalances. - Use incremental rebalancing on the downstream clusters to keep local consumers stable.
- Set
replication.factorto at least 3 in each region to survive single‑broker loss.
Monitoring and Troubleshooting
Metrics to Watch
| Metric (Prometheus name) | Why it matters |
|---|---|
kafka_consumer_lag | Direct indicator of backlog; high lag triggers autoscaling. |
kafka_consumer_fetch_rate | Shows throughput; sudden drops may signal a stalled rebalance. |
kafka_controller_rebalance_time_ms_total | Cumulative time spent in rebalances; spikes indicate frequent group churn. |
kafka_partition_under_replicated_partition | Highlights replication issues that can affect leader election. |
kafka_consumer_cooperative_rebalance_latency_ms | Specific to cooperative protocol; should stay < 50 ms in healthy clusters. |
Dashboards in Grafana can surface these metrics side‑by‑side, letting you set alerts such as “if kafka_consumer_lag > 100 k for > 5 min, page on‑call engineer”.
Common Failure Modes
- Hot partitions – Diagnose by inspecting
kafka_partition_bytes_in_totalper partition. If a single partition accounts for > 30 % of traffic, consider key redesign or increasing partition count. - Rebalance loops – Often caused by consumers that fail to
commitSyncduringonPartitionsRevoked. The group repeatedly attempts to assign the same partitions, leading to “rebalance in progress” errors. Fix by ensuring the listener commits and returns promptly. - Stale consumer metadata – When a broker is replaced without a rolling restart, some consumers may retain the old broker ID, causing
UNKNOWN_TOPIC_OR_PARTITIONerrors. Trigger a client restart or use theclient.idproperty with a UUID to force metadata refresh.
Key Takeaways
- Size partitions for throughput, not just parallelism; use a safety factor to accommodate traffic spikes.
- Prefer cooperative sticky assignor to shrink rebalance pause times, especially in large consumer groups.
- Make consumers stateless and let an external system (Kubernetes HPA, Prometheus) drive scaling decisions.
- Combine idempotent producers with transactions to achieve exactly‑once semantics without sacrificing rebalance stability.
- Monitor lag, fetch rate, and rebalance latency; set alerts before a small issue becomes a production outage.