What is a Kubernetes Persistent Volume?

Apr 03rd, 2024 | 8 min read

Stateful Workloads and Data Persistence
Why are Kubernetes Persistent Volumes Essential?
Types of Persistent Storage in Kubernetes
How do Persistent Volumes Work?
Your Stateful Workload
Questions and Answers

A persistent volume is a slice of storage provisioned by a Kubernetes administrator that can be attached and mounted to pods. Like everything in Kubernetes, it is a resource inside the cluster, and its lifecycle is either bound to the lifecycle of the pod or can survive pod deletions. It is backed by storage provided through the container storage interface (CSI).

Stateful Workloads and Data Persistence

Simplyblock's architecture showing how simplyblock is used with Kubernetes CSI — Simplyblock CSI Driver for Kubernetes Persistent Volumes

Originally designed to be stateless, containers were supposed to also be ephemeral and lightweight. They were designed to “boot” quickly and be small, maybe a few megabytes in size, sharing much of their host operating system.

This design quickly became a hassle, and people realized that you often have to persist data between container restarts. Some of this storage can be ephemeral (living until the pod ceases to exist) or persistent (which will stay alive indefinitely). This is specifically important for applications like databases or logs, as well as many other types of applications that need to hold serialized session information or similar states.

In general, the bigger the container-based landscape, the higher the chance of having stateful workloads in your deployments, especially with large Kubernetes deployments consisting of hundreds of nodes.

Using the CSI and its storage plugins, it is possible and recommended (at least by us) to separate the storage and compute infrastructure. This disaggregated architecture enables independent scalability and allows users to choose the best tool for the job.

To bind a persistent volume to a Kubernetes container, a so-called persistent volume is required that consists of the container’s storage requirement definition. That includes the StorageClass, basically what type of backing storage is requested (this is normally a 1:1 mapping to a specific CSI implementation, like simplyblock’s driver, or a combination of the CSI implementation and some characteristics, such as a performance policy), but also the requested size and lifecycle binding of the (persistent) volume claim and its persistent volume.

Why are Kubernetes Persistent Volumes Essential?

One could ask why not just directly bind to directories or other storage backends. The whole point of Kubernetes is to abstract away the underlying operating system, or better said, the whole environment outside of Kubernetes. This enables developers and DevOps folks to make specific assumptions about the environment, bringing development and production environments closer together. Especially for developers, this is a big win since debugging is easier than ever before.

To achieve this abstraction, applications, and containers should not make assumptions about the underlying storage (or any other resource). Hence, provisioning storage is part of the actual deployment process.

Managing persistent volume resources this way has quite a few benefits:

Decoupling: PVs decouple the storage provisioning from the pod or container lifecycle, allowing more flexibility and independence in managing storage resources.
Storage Persistence: PVs offer persistent storage, which survives pod terminations and rescheduling.
Dynamic Provisioning: PVs can be dynamically provisioned, enabling automatic on-demand creation and management according to the application requirements.
Portability: While not strictly true for all backing storage types, PVs enable the separation of concerns by providing access to local and remote storage options.
Resource Management: PVs, or better said, their storage class implementations, can provide better resource management by virtualizing the storage and sharing access to the same backing storage, hence reducing storage redundancy and optimizing resource allocation.

Types of Persistent Storage in Kubernetes

Kubernetes enables a wide variety of application deployments. Therefore, it also offers many storage options. While there are many more, we want to focus on the two specific ones that enable Kubernetes persistent storage, speed, and high availability (at least to some extent).

Kubernetes persistent volumes via storage plugins (CSI drivers) are your best bet for persistent storage and offer the widest variety of backing storage. Implementations can either offer access to an independently running storage cluster or appliance (disaggregated) or use the local storage of the Kubernetes worker nodes and offer it as a shared, clustered resource or just as an enhanced version of local mounts (hyperconverged). This type of storage is specifically interesting to persistent volumes since many companies provide enhanced features such as immediate snapshots, clones, thin provisioning, and more.

Editorial: Simplyblock provides the industry-leading cloud-native Kubernetes Enterprise storage to enable IO-intensive and latency-sensitive stateful workloads (like databases) that need increased reliability, speed, and cost-effectiveness. Additionally, simplyblock offers advanced deployment options, which include hyperconverged, disaggregated, and a combination of the two.

HostPath volumes are backed by a directory on the host machine. They stay intact when the pod or container is terminated or even deleted and can be reattached to a new pod or container as long as the new one is scheduled and created on the same Kubernetes worker node. This offers some type of durability but comes with the additional complexity of ensuring containers live on specific workers. While tainting nodes may work, a globally available (in the sense of across the Kubernetes cluster) backing storage will be easier and enable better resource utilization. This type is often used for speed requirements, taking a hit on high availability and fault tolerance.

Editorial: Simplyblock can provide high IOPS and predictable low latency without sacrificing high availability and fault tolerance.

How do Persistent Volumes Work?

Persistent Volumes are implemented using a combination of Kubernetes components and container storage interface (CSI)- backed storage plugins or drivers.

Internally, persistent storage is represented as a combination of persistent volumes (the actual logical storage entity) and persistent volume claims (the “assignment” of a volume to a container request for storage). Both types are Kubernetes resources and can be created, updated, or deleted using API calls or CRDs (custom resource definition).

Using a StorageClass, Kubernetes’ persistent volume claim (PVC) requests a specific backing implementation to provide a persistent volume. The storage class always defines which kind of storage implementation is used. However, a storage plugin may provide multiple storage classes and encode specific performance or other characteristics, such as high availability or levels of fault tolerance. The specific details depend on the storage plugin.

When the persistent volume claim is provided with a matching persistent volume, Kubernetes will mount the persistent volume into the pod, enabling access to the PV from inside the container. Additionally, the PVC may restrict or permit reading or writing data into the storage. These permissions can also enable features like simultaneously attaching the same persistent volume to multiple containers, provided that the backing implementation offers such functionality.

Last but not least, when the PV’s lifecycle is bound to the PVC, automatic resource management, like dynamic allocation and deallocation, happens when the PVC is created or deleted. Depending on the type of deployment, this can be a real benefit or completely useless. It’s up to you to decide when you want automatic resource management and when not—that’s the beauty of declarative configuration.

Your Stateful Workload

Persistent volumes (PVs) are critical in deploying stateful workloads into Kubernetes environments. By abstracting storage provisioning and management, PVs enable seamless integration of many storage resources, enabling the “use the best tool for the job” rule. That said, not all applications may have the same storage requirements, hence providing multiple storage classes can be required. Kubernetes makes this specifically easy with its storage plugin design and the definition of storage classes. Understanding persistent volumes is essential for Kubernetes administrators and developers looking to optimize storage management within their containerized environments.

Simplyblock offers a flexible, enterprise-ready Kubernetes storage solution that combines high performance, predictable low latency, and high availability, as well as fault tolerance. In addition, simplyblock provides advanced features such as immediate snapshots (copy-on-write), local and remote clones, cluster replication, thin provisioning, storage overcommitment, and zero downtime scalability. Do you want to learn more about simplyblock, or are you ready to use it ?

Questions and Answers

What is a Kubernetes Persistent Volume (PV)?

A Persistent Volume (PV) is a Kubernetes resource that provides durable storage for pods. Unlike ephemeral volumes, PVs persist beyond pod restarts and are provisioned independently of the pod lifecycle, making them essential for stateful workloads.

What is the difference between a PV and a PVC?

A Persistent Volume (PV) is the actual storage resource, while a Persistent Volume Claim (PVC) is a request for storage by a pod. Kubernetes matches a PVC to an available PV based on size, access mode, and storage class.

How does dynamic provisioning work with persistent volumes?

Dynamic provisioning automatically creates PVs on demand using a defined StorageClass. CSI drivers like the one from simplyblock enable Kubernetes to provision volumes with specific features like encryption or performance profiles.

Can you run databases with Persistent Volumes in Kubernetes?

Yes. Kubernetes PVs are commonly used for running PostgreSQL, MongoDB, and other databases. When paired with fast, NVMe-backed block storage, they provide the performance and durability required for production workloads.

How do you choose the right storage for Kubernetes PVs?

Look for storage that supports CSI, dynamic provisioning, snapshots, and high IOPS. NVMe over TCP with software-defined storage like simplyblock is ideal for low-latency, scalable Kubernetes environments.

Topics

Simplyblock

Supported Environments

Use Cases

What is a Kubernetes Persistent Volume?

Table Of Contents