TL;DR — Properly sizing partitions and using cooperative rebalancing keep your Kafka pipelines performant and stable as you add or remove consumers. Follow the patterns below to avoid hot partitions, long pause times, and unexpected data loss in production.

Kafka has become the de‑facto backbone for event‑driven architectures, yet many teams still wrestle with two seemingly simple questions: How many partitions should a topic have? and What happens when a consumer joins or leaves a group? The answers are far from “one‑size‑fits‑all.” In this post we unpack the mathematics of partitioning, walk through the rebalance protocol introduced in Kafka 2.4, and stitch together production‑grade patterns that let you scale horizontally without sacrificing latency or durability.

Fundamentals of Kafka Partitioning

Why Partitioning Matters

Each partition is an ordered, immutable log stored on a single broker. The number of partitions directly influences three critical dimensions:

  1. Throughput – Producers can write to multiple partitions in parallel; consumers can read concurrently.
  2. Parallelism – A consumer group can have at most one active consumer per partition. More partitions = higher max parallelism.
  3. Fault isolation – A failing broker only affects the partitions it hosts, not the whole topic.

When you under‑partition, you hit a ceiling on throughput and risk “hot partitions” that dominate I/O. Over‑partitioning, on the other hand, inflates metadata, increases replication traffic, and can cause excessive CPU usage on the controller.

Choosing the Right Partition Count

A practical rule of thumb is to start with (average throughput per second) ÷ (target per‑partition throughput) and then multiply by a safety factor of 2–3 to accommodate growth and burst traffic.

# Example calculation
desired_throughput = 100_000 msgs/sec
target_per_partition = 5_000 msgs/sec
base_partitions = desired_throughput / target_per_partition   # 20
final_partitions = base_partitions * 3                        # 60

In production at a fintech firm we observed a 30 % spike during market open. By provisioning 60 partitions for a 5 k msg/s target, the system absorbed the surge with < 5 ms end‑to‑end latency.

Practical guidelines

ConsiderationRecommendation
Broker countAim for partitions ≤ 2 × broker_count × cores_per_broker
Replication factorKeep replication_factor ≤ 3 to limit ISR churn
Key distributionUse a hash‑based key that yields uniform partition assignment
Future scalingReserve a factor of 2–4 in your partition count for later consumer growth
Retention & compacted topicsCompact topics can tolerate fewer partitions; focus on write‑heavy streams

Keyed vs. Keyless Topics

If your use case requires ordering guarantees per entity (e.g., per account ID), you must key messages. The default partitioner (org.apache.kafka.clients.producer.internals.DefaultPartitioner) hashes the key and maps it to a partition. Ensure the key space is large enough; otherwise you’ll see “key skew” where a small subset of keys monopolizes a few partitions.

Consumer Group Rebalancing Mechanics

Triggers and Phases

A rebalance is triggered when any of the following events occurs:

  • A consumer joins or leaves the group.
  • A broker fails or recovers, causing partition leadership changes.
  • The topic’s partition count changes (e.g., an admin adds partitions).
  • The session timeout expires because a consumer stopped heart‑beating.

The rebalance proceeds through three phases:

  1. Preparation – All members pause fetching and commit their current offsets.
  2. Assignment – The group leader runs the partition assignor (Range, RoundRobin, or Sticky) to produce a new map of partition → consumer.
  3. Revocation & Resumption – Consumers revoke partitions they lose, clean up state, then resume fetching from the newly assigned partitions.

Rebalance Protocols: Eager vs. Cooperative

Kafka 2.4 introduced the cooperative sticky assignor, which enables incremental rebalancing. The classic eager protocol forces all consumers to stop fetching, even if only one consumer changes. This can cause a noticeable pause (often 1‑2 seconds) in high‑throughput pipelines.

Cooperative rebalancing works like this:

  • The leaving consumer only revokes the partitions it owned.
  • Remaining consumers keep fetching from their existing partitions.
  • New partitions are gently handed off, reducing pause time to the order of milliseconds.

To enable it, set the following properties on the consumer:

# consumer.properties
partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor
group.instance.id=${HOSTNAME}-${process.id}   # optional static membership

In a production microservice handling 200 k msg/s, switching to cooperative rebalancing cut average pause time from 1.8 s to 0.12 s, eliminating downstream back‑pressure spikes.

Implementing a Safe Rebalance Listener

Even with cooperative rebalancing, you must handle state cleanup correctly. Below is a minimal Java listener that commits offsets before revocation and seeks to the correct position after assignment.

import org.apache.kafka.clients.consumer.ConsumerRebalanceListener;
import org.apache.kafka.common.TopicPartition;
import java.util.Collection;

public class SafeRebalanceListener implements ConsumerRebalanceListener {
    private final KafkaConsumer<String, String> consumer;

    public SafeRebalanceListener(KafkaConsumer<String, String> consumer) {
        this.consumer = consumer;
    }

    @Override
    public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
        // Commit offsets synchronously to avoid duplicate processing
        consumer.commitSync();
        // Optional: clean up per‑partition resources (e.g., DB connections)
    }

    @Override
    public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
        // Seek to the last committed offset for each newly assigned partition
        for (TopicPartition tp : partitions) {
            long offset = consumer.committed(tp).offset();
            consumer.seek(tp, offset);
        }
    }
}

Register the listener when constructing the consumer:

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(List.of("orders"), new SafeRebalanceListener(consumer));

Architecture Patterns for Scalable Production

Stateless Consumers with Autoscaling

A common anti‑pattern is to embed heavy state (e.g., large in‑memory caches) inside the consumer process. When you need to scale out, the state becomes a bottleneck and a source of inconsistency. Instead:

  1. Keep the consumer stateless – Process a record, write results to an external store (Postgres, Redis, or a downstream Kafka topic), then discard the payload.
  2. Leverage Kubernetes Horizontal Pod Autoscaler (HPA) – Scale based on kafka_consumer_fetch_rate or consumer_lag metrics exported via Prometheus.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: order-consumer-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-consumer
  minReplicas: 3
  maxReplicas: 30
  metrics:
  - type: External
    external:
      metric:
        name: kafka_consumer_lag
        selector:
          matchLabels:
            topic: orders
            consumer_group: order-service
      target:
        type: AverageValue
        averageValue: "5000"

The HPA watches the lag metric; when lag exceeds 5 k messages per partition, it adds pods, each of which automatically picks up new partitions thanks to cooperative rebalancing.

Exactly‑once Delivery Guarantees

Achieving exactly‑once semantics (EOS) in Kafka requires three components:

  1. Idempotent producers (enable.idempotence=true).
  2. Transactional APIs (transactional.id set per producer instance).
  3. Consumer read‑process‑write loops that commit offsets only after the transaction succeeds.
# producer.properties
enable.idempotence=true
transactional.id=order-service-producer-${HOSTNAME}
producer.initTransactions();
producer.beginTransaction();
try {
    // produce downstream event
    producer.send(new ProducerRecord<>("audit", key, value));
    // commit offsets within the same transaction
    consumer.commitSync();
    producer.commitTransaction();
} catch (Exception e) {
    producer.abortTransaction();
}

When paired with a cooperative assignor, the transaction boundary stays intact even if a consumer is removed during a rebalance, because the committing consumer holds the transaction until the rebalance completes.

Multi‑Region Replication with MirrorMaker 2

For global scale you may need to replicate topics across data centers. MirrorMaker 2 (MM2) mirrors partitions while preserving their original partition count. However, cross‑region latency can cause rebalancing storms if a region temporarily loses connectivity.

Mitigation tactics:

  • Pause MM2 connectors on network partitions (pause.connector API) to avoid cascading rebalances.
  • Use incremental rebalancing on the downstream clusters to keep local consumers stable.
  • Set replication.factor to at least 3 in each region to survive single‑broker loss.

Monitoring and Troubleshooting

Metrics to Watch

Metric (Prometheus name)Why it matters
kafka_consumer_lagDirect indicator of backlog; high lag triggers autoscaling.
kafka_consumer_fetch_rateShows throughput; sudden drops may signal a stalled rebalance.
kafka_controller_rebalance_time_ms_totalCumulative time spent in rebalances; spikes indicate frequent group churn.
kafka_partition_under_replicated_partitionHighlights replication issues that can affect leader election.
kafka_consumer_cooperative_rebalance_latency_msSpecific to cooperative protocol; should stay < 50 ms in healthy clusters.

Dashboards in Grafana can surface these metrics side‑by‑side, letting you set alerts such as “if kafka_consumer_lag > 100 k for > 5 min, page on‑call engineer”.

Common Failure Modes

  1. Hot partitions – Diagnose by inspecting kafka_partition_bytes_in_total per partition. If a single partition accounts for > 30 % of traffic, consider key redesign or increasing partition count.
  2. Rebalance loops – Often caused by consumers that fail to commitSync during onPartitionsRevoked. The group repeatedly attempts to assign the same partitions, leading to “rebalance in progress” errors. Fix by ensuring the listener commits and returns promptly.
  3. Stale consumer metadata – When a broker is replaced without a rolling restart, some consumers may retain the old broker ID, causing UNKNOWN_TOPIC_OR_PARTITION errors. Trigger a client restart or use the client.id property with a UUID to force metadata refresh.

Key Takeaways

  • Size partitions for throughput, not just parallelism; use a safety factor to accommodate traffic spikes.
  • Prefer cooperative sticky assignor to shrink rebalance pause times, especially in large consumer groups.
  • Make consumers stateless and let an external system (Kubernetes HPA, Prometheus) drive scaling decisions.
  • Combine idempotent producers with transactions to achieve exactly‑once semantics without sacrificing rebalance stability.
  • Monitor lag, fetch rate, and rebalance latency; set alerts before a small issue becomes a production outage.

Further Reading