Skip to main content

NVMe Latency

NVMe latency is the time between when a storage request is issued and when the result is returned — measured in microseconds (µs) rather than the milliseconds typical of older storage interfaces. It is one of the defining characteristics of NVMe-based storage: a modern NVMe SSD handles a random 4K read in roughly 20–70 µs, compared to 100–200 µs for a SATA SSD and 5–10 ms for a spinning disk. That gap is not incidental — it comes from the NVMe protocol’s architecture, which was designed specifically for flash media rather than adapted from legacy disk interfaces.

Key Facts NVMe Latency
Typical local NVMe 20–70 µs random read · 30–100 µs random write (device-level)
NVMe/TCP (network) 300–500 µs typical · sub-200 µs achievable on optimized paths
Why it is low 64K queues × 64K commands · direct PCIe path · no legacy controller
Primary use cases Databases, real-time analytics, AI inference, latency-sensitive Kubernetes workloads

For most application workloads the difference between 20 µs and 200 µs is not visible. It becomes critical when storage access is on the critical path of every transaction — databases, in-memory caches with persistent backing, real-time analytics engines, and AI inference pipelines. At those latencies, the storage layer either enables or constrains the application.

What is NVMe latency: I/O requests processed through parallel NVMe queues with measured outcomes per protocol

How NVMe Achieves Low Latency

Three architectural decisions in the NVMe protocol directly reduce latency:

Parallel command queues: NVMe supports up to 65,535 I/O queues, each holding up to 65,535 commands. Legacy interfaces like SATA support a single queue of 32 commands. On a modern multi-core CPU, this means storage requests from different cores or processes can be submitted and processed in parallel without queuing behind each other.

Direct PCIe attachment: NVMe devices attach directly to the CPU via PCIe, bypassing the storage controller bus that SATA and SAS use. This removes a translation layer and reduces round-trip distance. PCIe 4.0 provides up to 16 GB/s per ×4 slot; PCIe 5.0 doubles that.

No legacy protocol overhead: SATA and SAS adapted ATA/SCSI command sets originally designed for spinning disks. NVMe was designed from scratch for flash, with a command set optimized for the access patterns of solid-state media.

🚀 Need NVMe latency for workloads running across multiple Kubernetes nodes? simplyblock delivers sub-millisecond NVMe/TCP storage with no specialized hardware — kernel-path transport over standard Ethernet. 👉 Explore NVMe/TCP Kubernetes Storage →

NVMe Latency vs. Other Storage Protocols

ProtocolRead latencyWrite latencyMax queue depthBandwidth
NVMe (local PCIe)~20–70 µs~30–100 µs64K × 64K cmdsUp to 14 GB/s (PCIe 4.0 ×4)
NVMe/TCP (network)~300–500 µs~300–500 µsHighNetwork-limited
NVMe/RoCE (RDMA)~80–150 µs~80–150 µsHighNetwork-limited
iSCSI~500–800 µs~500–800 µsModerateNetwork-limited
SATA SSD~100–200 µs~100–200 µs32 cmds~600 MB/s
HDD5–10 ms5–10 ms1 cmd~150 MB/s

Factors That Affect NVMe Latency

Local NVMe latency is largely determined by the device. Network NVMe latency is more variable:

Network path: NVMe/TCP adds TCP/IP stack processing time. On a well-configured network with adequate bandwidth, total round-trip latency (host → network → storage → return) typically lands between 300–500 µs. NVMe over RoCE uses RDMA to bypass much of that overhead, achieving 80–150 µs at the cost of a lossless network fabric and RDMA-capable NICs.

Queue depth: At low queue depths (QD=1), latency is dominated by round-trip time. At high queue depths, throughput saturates but per-operation latency rises slightly as commands queue. For database workloads, QD=8–32 typically represents the best latency-throughput balance.

Tail latency: P99 and P999 latency matter as much as median for databases. Storage systems that introduce garbage collection, replication synchronization, or reconstruction overhead create latency spikes that median metrics hide. This is why tail latency benchmarks are more diagnostic than averages.

Host configuration: CPU power management (C-states), PCIe slot configuration, and NUMA topology all affect NVMe latency. For latency-sensitive workloads, disabling deep C-states and ensuring the NVMe device shares a NUMA node with the application CPU reduces jitter.

NVMe Latency in Kubernetes

In Kubernetes, persistent volumes are almost always network-attached — a pod does not typically have a local NVMe device. The storage protocol used by the CSI driver determines the effective latency:

  • iSCSI-backed volumes (Longhorn, some OpenEBS configurations): 500–800 µs typical
  • NVMe/TCP-backed volumes (simplyblock, some SPDK-based systems): 300–500 µs, sub-200 µs on optimized paths
  • NVMe/RoCE: 80–150 µs, requires RDMA infrastructure in the cluster

For database workloads on Kubernetes — PostgreSQL, MySQL, Cassandra, ClickHouse — the difference between iSCSI and NVMe/TCP latency is measurable in query tail latency and transaction throughput. simplyblock’s ClickHouse benchmark shows the practical impact.

simplyblock and NVMe Latency

simplyblock’s storage platform uses NVMe/TCP as its transport layer, connecting Kubernetes pods to NVMe-backed storage pools over standard Ethernet. The kernel-path implementation avoids the user-space overhead that affects some iSCSI-based systems, keeping latency consistently in the 300–500 µs range on typical cluster networks.

simplyblock runs in either hyper-converged mode (storage on compute nodes, shorter network path) or disaggregated mode (dedicated storage nodes). In HCI deployments on the same physical host or rack, effective NVMe/TCP latency can drop below 200 µs — approaching local NVMe performance for most application workloads.

Key latency-relevant features: multi-tenant QoS prevents one tenant’s I/O from creating latency spikes for others; instant snapshots avoid the I/O pause that some snapshot mechanisms introduce; and erasure coding provides fault tolerance without the write amplification overhead of synchronous 3× replication.

NVMe over TCP NVMe over RoCE IOPS NVMe over TCP Latency Characteristics

Questions and Answers

How low can NVMe latency go compared to traditional storage?

Modern NVMe devices achieve random read latency around 20–70 microseconds at the device level, versus 100–200 µs for SATA SSDs and 5–10 ms for HDDs. Over NVMe/TCP, network round-trip adds 250–450 µs, putting total latency in the 300–500 µs range — still 2–10× lower than iSCSI-based networked storage.

Why does latency matter more than IOPS for database workloads?

IOPS measures throughput — how many operations per second. Latency measures responsiveness — how long each individual operation takes. For OLTP databases, every query involves multiple storage round-trips. If each takes 500 µs instead of 200 µs, query latency increases proportionally. Tail latency (P99, P999) matters most: a single slow storage operation can hold up a transaction that is otherwise complete.

How does simplyblock optimize NVMe latency across the network?

simplyblock uses kernel-path NVMe/TCP transport, which avoids the user-space processing overhead of iSCSI and some NVMe-oF implementations. In hyper-converged deployments, the short network path between compute and storage nodes keeps latency below 200 µs. Per-volume QoS prevents noisy-neighbor effects that create latency jitter in multi-tenant environments.

When does NVMe latency become critical for Kubernetes storage?

NVMe latency matters when storage is on the hot path: transactional databases, real-time analytics, message queues, and AI inference serving. Batch workloads, log archiving, and object storage are generally less sensitive. If P99 storage latency exceeds 1 ms and the application is latency-sensitive, storage protocol choice is typically the first thing worth investigating.

How does NVMe over TCP compare to iSCSI in real-world latency?

Benchmarks consistently show 25–40% lower latency for NVMe/TCP versus iSCSI at equivalent queue depths and block sizes. For databases under sustained load, the difference appears most clearly in P99 and P999 tail latency. See our NVMe/TCP vs iSCSI benchmark for detailed measurements.