Skip to main content

Kubernetes Storage for Kafka

Running Apache Kafka on Kubernetes introduces specific storage requirements that differ from typical stateless workloads. Kubernetes storage for Kafka must deliver predictable sequential write throughput, low producer latency, and durable log-segment persistence — because Kafka’s correctness and performance model depends on append-only writes to disk completing with minimal jitter.

Key Facts Kubernetes Storage for Kafka
Type Distributed log and event stream
I/O driver Sequential writes plus log compaction
Key metric Producer latency and consumer lag
Best fit transport NVMe/TCP or NVMe/RoCE

Each Kafka broker operates as a stateful process that owns its log-segment data. On Kubernetes this translates to StatefulSets, where each broker pod is assigned a stable identity and one or more PersistentVolumeClaims (PVCs) that hold log directories. The storage layer must survive pod rescheduling and provide consistent throughput even during background compaction and retention cleanup.

What is Kubernetes Storage for Kafka: PVCs and NVMe/TCP block volumes back Kafka broker StatefulSets for durable, low-latency log-segment storage

How Kafka Uses Storage on Kubernetes

Kafka brokers run as a StatefulSet in Kubernetes. The volumeClaimTemplates field of the StatefulSet spec provisions one PVC per broker replica, binding each broker to a named persistent volume. Brokers write log segments sequentially, with background threads performing log compaction and retention cleanup that generate additional I/O.

Key characteristics of Kafka storage I/O:

  • Append-only writes: the hot path is sequential writes to the active log segment. Latency here directly controls producer latency and replication lag.
  • Compaction I/O: log compaction reads old segments and rewrites merged output. This background I/O contends with active produce traffic unless the storage layer enforces QoS.
  • Replication traffic: Kafka’s internal replication protocol creates read traffic from follower brokers fetching the leader’s log. Under high throughput this can compete with producer writes on the same volume.

Zone placement matters: placing a broker’s PVC in a different availability zone from the pod creates network hops that show up directly in producer latency and inflate replication overhead.

Local NVMe vs. NVMe/TCP for Kafka

A common choice for production Kafka is directly attached NVMe drives on the broker nodes. Local NVMe gives excellent sequential write bandwidth and predictable latency but couples broker scheduling to specific nodes. If a node is drained or fails, Kubernetes cannot reschedule the broker pod elsewhere — the pod stays pending until the original node is available or manual intervention occurs.

NVMe/TCP (and NVMe/RoCE) disaggregate storage from compute. Broker pods become effectively stateless: they request a PVC, the CSI driver attaches the NVMe volume over the network fabric, and the broker writes to it as if it were local. If the broker pod is rescheduled, the PVC detaches and reattaches to the new node. For teams running Kubernetes node autoscaling or frequent rolling upgrades, disaggregated NVMe/TCP substantially reduces operational friction.

ApproachBroker portabilitySequential write latencyOperational risk on node failure
Local NVMe (hostPath or local PV)Node-bound; pod pending if node unavailableLowest (no network hop)High — manual recovery often required
NVMe/TCP disaggregatedFull Kubernetes scheduling freedomSub-millisecond over 25 GbE+Low — PVC reattaches on new node
NVMe/RoCE disaggregatedFull scheduling freedomRDMA latency, lowest jitterLow — requires RDMA fabric
iSCSI / legacy blockGood portabilityHigher latency, SCSI overheadMedium — protocol reconnect delays

Sizing PVCs for Kafka Brokers

PVC sizing for Kafka brokers depends on retention policy, replication factor, and throughput. A useful formula:

PVC size ≈ (throughput MB/s × retention seconds × replication factor) + compaction headroom (20–30%)

Additional guidance:

  • Thin provisioning: enables over-allocation at the StorageClass level so PVCs can be larger than physically consumed space. This is important for Kafka because log growth is bursty; teams often overprovision to avoid out-of-space events.
  • Separate log and index volumes: high-throughput clusters sometimes place Kafka’s log segments and index files on separate PVCs to isolate I/O patterns.
  • Monitor volume health: Kubernetes volume health monitoring signals early-warning conditions before a broker runs out of space or encounters I/O errors.

IOPS requirements are typically modest for the sequential write path but spike during compaction. Understanding the IOPS floor required to keep compaction from stalling is an important pre-production benchmark step.

QoS Isolation for Multi-Tenant Kafka

In shared Kubernetes clusters running multiple Kafka deployments or co-located with other workloads, background compaction I/O from one Kafka cluster can saturate the storage fabric and cause producer latency spikes in other tenants. Storage-level QoS allows platform teams to assign per-volume IOPS and throughput ceilings so that compaction bursts are absorbed without impacting producer SLOs.

Without QoS isolation, a single Kafka topic under heavy compaction can destabilize every other stateful workload sharing the storage backend.

🚀 Run production Kafka on Kubernetes with NVMe-backed block storage Simplyblock provides NVMe/TCP and NVMe/RoCE PVCs for Kafka StatefulSets with multi-tenant QoS that protects producer latency during compaction bursts. 👉 Kubernetes storage for stateful workloads

Kafka Recovery and Snapshots

When a Kafka broker pod is rescheduled after a node failure, the broker resumes from its last committed log position. Recovery time depends on how much data the follower replica needs to catch up. With disaggregated block storage:

  • PVCs persist across pod deletions; the data is not lost when a pod is deleted.
  • CSI volume snapshots can create point-in-time copies of broker log volumes for backup or rapid environment cloning.
  • Thin clones from a snapshot let teams create staging Kafka clusters that share underlying data blocks with production, minimizing storage cost for test environments.

Kubernetes Storage for Kafka with Simplyblock

Simplyblock provides CSI-provisioned block volumes backed by NVMe/TCP or NVMe/RoCE, making disaggregated storage practical for production Kafka clusters. Key capabilities relevant to Kafka deployments:

  • Multi-tenant QoS: per-volume IOPS and bandwidth limits protect producer latency SLOs when compaction or retention workloads run in parallel on the same cluster.
  • Thin provisioning: Kafka teams can allocate large PVCs without pre-consuming physical capacity, then let volumes grow as log retention fills up.
  • Instant snapshots: CSI VolumeSnapshots from simplyblock are thin clones, making broker backup and staging environment creation fast and storage-efficient.
  • Broker portability: because volumes attach over the network, Kafka broker pods are fully portable — node drains, rolling upgrades, and autoscaling all work without manual volume management.

For teams migrating from HCI or vSAN-based Kafka deployments to a disaggregated Kubernetes model, simplyblock’s StorageClasses map cleanly to the QoS tiers previously managed by vSAN storage policies.

The following glossary entries cover the building blocks behind production Kafka storage on Kubernetes.

Questions and Answers

What storage does Kafka need in Kubernetes?

Kafka brokers require block storage with predictable sequential write throughput and low producer latency. Each broker runs as part of a StatefulSet and owns one or more PVCs for its log segments. The storage backend must handle both the hot write path and background compaction I/O without causing latency spikes. NVMe-backed block volumes — either local or disaggregated via NVMe/TCP — are the most common choice for production deployments.

Can Kafka run on NVMe/TCP storage?

Yes. NVMe/TCP disaggregates storage from the broker node, so Kafka pods become fully portable within the Kubernetes cluster. The CSI driver attaches the NVMe volume over standard Ethernet, the broker writes to it as if it were local, and the volume reattaches automatically if the pod is rescheduled. For clusters where sub-millisecond latency is required, NVMe/RoCE is an alternative for environments with RDMA fabrics. Both transports are supported by simplyblock.

How do I size PVCs for Kafka brokers?

Multiply your expected throughput in MB/s by your retention window in seconds, then multiply by the replication factor. Add 20–30% headroom for log compaction temporary space. Use thin provisioning if your storage backend supports it — this lets you set PVC sizes generously without pre-consuming physical capacity. Monitor actual volume utilization and trigger PVC expansion before brokers approach capacity limits.

What happens to Kafka data when a broker Pod is rescheduled?

With a CSI-backed PVC, the data persists on the storage backend independently of the pod lifecycle. When the broker pod is rescheduled — whether due to a node failure, a node drain, or an autoscaling event — Kubernetes binds the same PVC to the new pod instance. The broker resumes from its last committed log position. Recovery time depends on how much replication lag has accumulated; disaggregated NVMe/TCP storage minimizes recovery time by providing fast re-attach and consistent sequential read bandwidth for catch-up replication.