TL;DR — Proper partition sizing, deterministic assignment, and proactive rebalance handling can double throughput while keeping latency sub‑second. Use static partition counts, enable incremental cooperative rebalancing, and tune producer batching to get the most out of a Kafka deployment.
Kafka powers the data pipelines of countless tech giants, but the same flexibility that makes it attractive also hides traps. Mis‑sized partitions cause hot spots, aggressive rebalances stall consumers, and unchecked producer settings waste network bandwidth. This post walks you through a production‑ready approach: start with a disciplined partitioning strategy, then lock down consumer‑group rebalance behavior, and finally squeeze out every last byte of producer performance.
Understanding Kafka Topic Partitioning
Why Partition Count Matters
A Kafka topic is an ordered log split into N partitions. Each partition lives on a single broker, and each consumer in a group reads from a subset of those partitions. The partition count therefore determines two critical dimensions:
- Parallelism – A consumer can only read from one thread per partition. More partitions → more consumer threads → higher aggregate throughput.
- Data Distribution – Partitions are the unit of replication and load‑balancing. Uneven key distribution can leave some brokers overloaded while others sit idle.
Real‑world numbers
| Topic | Partitions | Avg. Throughput per partition (msg/s) | Peak Broker CPU |
|---|---|---|---|
| clickstream‑raw | 120 | 15,000 | 78 % |
| orders‑events | 48 | 8,200 | 64 % |
| inventory‑updates | 24 | 4,900 | 55 % |
In the clickstream example, the team originally used 30 partitions, hitting 95 % CPU on a single broker during traffic spikes. Doubling the partition count spread the load and reduced latency from 250 ms to 78 ms.
Choosing the Right Partition Count
- Estimate peak throughput – Multiply expected messages per second by average message size, then divide by the target per‑partition throughput (often 5‑10 k msg/s for SSD‑backed brokers).
- Consider consumer parallelism – Each consumer instance should have at least one thread per partition. If you plan to run 12 consumer instances, a multiple of 12 (e.g., 48 or 96) avoids uneven assignment.
- Future‑proofing – Adding partitions later triggers a full rebalance. Over‑provision by 20‑30 % if you anticipate growth.
Pro tip: Use the formula
Partitions = ceil(PeakMsgsSec / 8,000) * ConsumerInstancesas a quick sanity check.
Key Partitioning Patterns
| Pattern | When to Use | Implementation Hint |
|---|---|---|
| Hash‑Based Key | Guarantees ordering per key (e.g., user ID) | producer.send(topic, key=userid, value=payload) |
| Round‑Robin | No ordering needed, maximize dispersion | Set partitioner.class=org.apache.kafka.clients.producer.internals.DefaultPartitioner and omit key |
| Custom Partitioner | Complex routing (e.g., geo‑region + device type) | Extend org.apache.kafka.clients.producer.Partitioner and register via partitioner.class |
Example: Custom Partitioner in Java
public class RegionDevicePartitioner implements Partitioner {
@Override
public void configure(Map<String, ?> configs) {}
@Override
public int partition(String topic, Object key, byte[] keyBytes,
Object value, byte[] valueBytes, Cluster cluster) {
// key format: "region|deviceType"
String[] parts = ((String) key).split("\\|");
String region = parts[0];
String device = parts[1];
int partition = Math.abs((region + device).hashCode()) % cluster.partitionCountForTopic(topic);
return partition;
}
@Override
public void close() {}
}
Deploy the JAR on all producer nodes and set partitioner.class=com.example.RegionDevicePartitioner.
Consumer Group Rebalancing Mechanics
When a consumer joins or leaves a group, Kafka triggers a rebalance to redistribute partitions. The default eager protocol revokes all assignments, waits for every consumer to stop, then re‑assigns. In large clusters this can cause seconds‑long stalls.
Incremental Cooperative Rebalancing (ICR)
Introduced in Kafka 2.4, ICR allows partial revocations. Only the partitions that truly need to move are reassigned, keeping the rest active. This reduces pause time dramatically.
Enabling ICR
# consumer.properties
group.id=my-consumer-group
partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor
enable.auto.commit=false
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("orders-events"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(200));
for (ConsumerRecord<String, String> record : records) {
// process record
}
consumer.commitSync();
}
Handling Rebalance Callbacks
Even with ICR, you must gracefully handle onPartitionsRevoked and onPartitionsAssigned. A common mistake is to close resources inside the revoked callback, which can block the rebalance thread.
consumer.subscribe(Arrays.asList("clickstream-raw"), new ConsumerRebalanceListener() {
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
// Flush in‑flight offsets, but avoid long‑running IO
pendingOffsets.forEach((tp, offset) -> commitOffset(tp, offset));
pendingOffsets.clear();
}
@Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
// Initialize per‑partition state (e.g., seek to latest)
partitions.forEach(tp -> consumer.seekToEnd(Collections.singletonList(tp)));
}
});
Best‑practice checklist
- Commit before revocation – Guarantees no duplicate processing.
- Avoid network calls – Keep callbacks < 100 ms; offload heavy work to a background thread.
- Log partition changes – Helps diagnose “why did my latency spike?” after a scale‑out event.
Monitoring Rebalance Health
| Metric | Prometheus query | Alert threshold |
|---|---|---|
kafka_consumer_rebalance_total | rate(kafka_consumer_rebalance_total[5m]) | > 0.2 rebalance/min |
kafka_consumer_rebalance_latency_seconds | histogram_quantile(0.95, rate(kafka_consumer_rebalance_latency_seconds_bucket[5m])) | > 2 s |
consumer_lag | kafka_consumer_lag{topic="orders-events"} | > 10 k msgs |
Set up Grafana dashboards to spot “rebalance storms” caused by flaky instances or aggressive autoscaling policies.
Production Performance Optimization Patterns
Producer Batching & Compression
By default, the producer sends each record immediately (linger.ms=0). In high‑throughput workloads, batching reduces network round‑trips and improves compression ratios.
# producer.properties
linger.ms=5 # wait up to 5 ms for more records
batch.size=32768 # 32 KB batch target
compression.type=snappy
acks=all
retries=5
max.in.flight.requests.per.connection=5
Result: In a 200 MB/s clickstream pipeline, enabling a 5 ms linger cut outbound packets from 12,000 pps to 2,400 pps and improved CPU utilization on the producer host by 30 %.
Idempotent Producers & Exactly‑Once Semantics (EOS)
When you enable enable.idempotence=true, Kafka guarantees that retries won’t duplicate messages. Combine this with transaction APIs for end‑to‑end exactly‑once.
Properties props = new Properties();
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "clickstream-tx-01");
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
producer.initTransactions();
producer.beginTransaction();
try {
producer.send(new ProducerRecord<>("clickstream-raw", key, value));
// ... more sends
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
}
Trade‑off: Transactional writes add a small latency overhead (≈ 1 ms) but eliminate downstream deduplication logic.
Tuning fetch.max.bytes and max.partition.fetch.bytes
Consumers that process large batches should request more data per poll. However, setting these too high can cause OOM on the consumer JVM.
fetch.max.bytes=52428800 # 50 MB per request
max.partition.fetch.bytes=10485760 # 10 MB per partition
Monitor consumer_fetch_manager_bytes_consumed_total to ensure the consumer stays within its heap limits.
Leveraging Tiered Storage for Hot Partitions
If a topic has a few “hot” partitions (e.g., a premium user segment) that receive the majority of traffic, move those partitions to SSD‑backed brokers while letting cold partitions sit on cheaper HDD nodes. Kafka 3.3 introduced tiered storage APIs to automate this.
kafka-reassign-partitions.sh \
--bootstrap-server broker1:9092 \
--reassignment-json-file hot-partition-reassign.json \
--execute
hot-partition-reassign.json contains a mapping that pins hot partitions to SSD brokers.
Observability Blueprint
- End‑to‑end latency – Use
kafka.producer.record.send.total.time.msandconsumer.fetch.latency.avgmetrics; correlate in a Grafana panel. - Back‑pressure detection – Track
producer.metrics.buffer-exhausted-rateto spot when the internal buffer overflows. - Hot‑partition alerts –
kafka.server.replica_manager.leader_bytes_in_rateper partition; alert if any partition exceeds 80 % of broker NIC capacity.
Architecture Blueprint for Scalable Consumers
Below is a reference architecture that we’ve deployed at a mid‑size e‑commerce platform handling 1 M events/sec.
+-------------------+ +-------------------+ +-------------------+
| Producer Fleet | ---> | Kafka Cluster | ---> | Consumer Group |
| (Java, Go, Python)| | (6 brokers, 12 | | (12 instances, |
| Linger=5ms, | | partitions each) | | Cooperative |
| Snappy) | | | | assignor) |
+-------------------+ +-------------------+ +-------------------+
Key components:
- **Ingress**: Load‑balanced producers behind an Envoy proxy; each producer runs in a Kubernetes pod with `resources: {cpu: "2", memory: "4Gi"}`.
- **Kafka**: Tiered storage enabled; hot partitions (top 5 % keys) pinned to SSD nodes.
- **Consumers**: Stateless micro‑services using Spring Cloud Stream; each pod runs 4 consumer threads (one per partition) and commits offsets synchronously after each batch.
- **Observability**: Prometheus scrapes JMX exporters; alerts route to PagerDuty.
Scaling Guidelines
| Scale Trigger | Action |
|---|---|
| CPU > 80 % on broker | Add a broker, rebalance partitions using kafka-reassign-partitions.sh. |
| Consumer lag > 30 s | Increase consumer instance count to a multiple of partition count; enable ICR to avoid full stalls. |
| Producer buffer‑exhausted spikes | Raise batch.size or increase max.in.flight.requests.per.connection. |
Key Takeaways
- Partition count is a capacity knob – size it for peak throughput and consumer parallelism, then over‑provision by ~20 % for growth.
- Enable Cooperative Sticky Assignor to keep rebalances incremental and avoid full consumer stalls.
- Batch aggressively on the producer side (
linger.ms,batch.size) and use Snappy compression for network efficiency. - Use idempotent producers and transactions when exactly‑once semantics are required; the latency penalty is modest.
- Monitor rebalance latency, consumer lag, and broker CPU with Prometheus; set alerts before a storm impacts SLAs.
- Tier hot partitions to SSD‑backed brokers to prevent a single key from throttling the whole pipeline.