Skip to main content

Avatar photo

What is a Kubernetes Persistent Volume?

Apr 03rd, 2024 | 7 min read

A persistent volume is a slice of storage, provisioned by an Kubernetes administrator, that can be attached and mounted to pods. Like everything in Kubernetes, it is a resource inside the cluster, and its lifecycle is either bound to the lifecycle of the pod, or can survive pod deletions. It is backed by storage provided through the container storage interface (or CSI) .

Stateful Workloads and Data Persistence

Originally designed to be stateless, containers were supposed to also be ephemeral and lightweight. They were designed to “boot” quickly and be small, maybe a few megabytes in size, sharing much of their host operating system.

This design quickly became a hassle and people realized that you often have to persist data between container restarts. Some of this storage can be ephemeral (living until the pod ceases to exist), or persistent (which will stay alive indefinitely). This is specifically important for applications like databases or logs, but also many other types of applications that need to hold serialized session information or similar state.

In general, the bigger the container-based landscape, the higher the chance you have stateful workloads in your deployments. Especially with large Kubernetes deployments consisting of hundreds of nodes.

Using the CSI, and its storage plugins, it is possible and recommended (at least by us) to separate the storage and compute infrastructure. This disaggregated architecture enables independent scalability and offers the possibility to choose the best tool for the job.

To bind a (persistent) volume to a container, a so-called (persistent) volume claim which consists of the storage requirement definition of the container. That includes the StorageClass , basically what type of backing storage is requested (this is normally a 1:1 mapping to a specific CSI implementation, like simplyblock’s driver, or a combination of the CSI implementation and some characteristics, such as a performance policy), but also the requested size and lifecycle binding of the (persistent) volume claim and its persistent volume.

Why are Persistent Volumes Essential?

One could ask why not just bind to directories or other storage backends directly. The whole point of Kubernetes is to abstract away the underlying operating system, or better said, the whole environment outside of Kubernetes. This enables developers and DevOps folks to make specific assumptions about the environment and brings development and production environments closer together. Especially for developers this is a big win since debugging is easier than ever before.

Anyway, to achieve this abstraction, applications and containers should not make any assumptions of the underlying storage (or any other kind of resource), hence provisioning storage is part of the actual deployment process.

Managing persistent volume resources this way has quite a few benefits:

  1. Decoupling: PVs decouple the storage provisioning from the pod or container lifecycle, allowing for more flexibility and independence in managing storage resources.
  2. Storage Persistence: PVs offer persistent storage which survives pod terminations and rescheduling.
  3. Dynamic Provisioning: PVs can be dynamically provisioned, enabling automatic on-demand creation and management, according to the application requirements.
  4. Portability: While not strictly true for all backing storage types, PVs enable the separation of concerns by providing access to local and remote storage options.
  5. Resource Management: PVs, or better said their storage class implementations, can provide better resource management, by virtualizing the storage and sharing access to the same backing storage, hence reducing storage redundancy and optimizing resource allocation.

Types of Persistent Storage in Kubernetes

Kubernetes enables a wide variety of applications deployments. Therefore, it also offers a lot of options on storage types. While there are many more, we want to focus on the two specific ones that enable persistent storage, as well as speed, and high availability (at least to some extent).

Storage plugins (CSI drivers) volumes are your best bet when it comes to persistent storage, and they also offer the broadest variety of backing storage. Implementations can either offer access to an independently running storage cluster or appliance (disaggregated) or use local storage of the Kubernetes worker nodes and offer it as a shared, clustered resource, or just as an enhanced version of local mounts (hyperconverged). This type of storage is specifically interesting to persistent volumes since many companies provide enhanced features such as immediate snapshots, clones, thin provisioning, and more.

Editorial: Simplyblock provides its features that way, too, enabling IO-heavy and latency-sensitive stateful workloads (like databases) which have a need for increased reliability, speed, and cost-effectiveness.

HostPath volumes are backed by a directory on the host machine. They stay intact when the pod or container is terminated or even deleted, and can be reattached to a new pod or container, as long as the new one is scheduled and created on the same Kubernetes worker node. This offers some type of durability, but comes at the additional complexity of making sure that containers live on specific workers. While tainting nodes may work, a globally available (in the sense of across the Kubernetes cluster) backing storage will be easier and enables better resource utilization. This type is often used for speed requirements, taking a hit on high availability and fault tolerance. Editorial: Simplyblock can provide both high IOPS and predictable low latency without sacrificing high availability and fault tolerance.

How do Persistent Volumes Work?

Persistent Volumes are implemented through a combination of Kubernetes components and the container storage interface (CSI) backed storage plugins or storage drivers.

Internally, persistent storage is represented as a combination of persistent volumes (the actual logical storage entity) and persistent volume claims (the “assignment” of a volume to a container request for storage). Both are Kubernetes resources and can be created, updated, or deleted using API calls or CRDs (custom resource definition).

Using a StorageClass the persistent volume claim (PVC) requests a specific backing implementation to provide a persistent volume. The storage class always defines which kind of storage implementation is used, however, a storage plugin may provide multiple storage classes, also encoding specific performance or other characteristics, such as high availability or fault tolerance levels. The specific details depend on the storage plugin.

When the persistent volume claim is provided with a matching persistent volume, Kubernetes will mount the persistent volume into the pod, enabling access to the PV from inside the container. Additionally, the PVC may restrict or permit reading or writing data into the storage. These permissions can also enable features like attaching the same persistent volume to multiple containers at the same time, provided that the backing implementation offers such functionality.

Last but not least, when the PV’s lifecycle is bound to the PVC, automatic resource management, like dynamic allocation and deallocation happens when the PVC is created or deleted. Depending on the type of deployment, this can be a real benefit, or completely useless. It’s up to you to decide when you want automatic resource management and when not. The beauty of declarative configuration.

Your Stateful Workload

Persistent volumes (PVs) play a critical role in deploying stateful workloads into Kubernetes environments. By abstracting storage provisioning and management, PVs enable seamless integration of a multitude of storage resources, enabling the “use the best tool for the job” rule. That said, not all applications may have the same storage requirements, hence providing multiple storage classes can be required. Kuberentes makes this specifically easy with its storage plugin design and the definition of storage classes. Anyhow, understanding persistent volumes is essential for Kubernetes administrators and developers looking to optimize storage management within their containerized environments.

Simplyblock offers a disaggregated Kubernetes storage solution which combines high performance, predictable low latency, and high availability, as well as fault tolerance. In addition, simplyblock provides features such as copy-on-write snapshots (meaning, immediate snapshots), local and remote clones, cluster replication, thin provisioning, storage overcommitment, and zero downtime scalability. Want to learn more about simplyblock or are you already ready to use it ?

You may also like:

Simple Block Header image

Amazon EKS vs. ECS: Understanding the Differences and Choosing the Right Service

Simple Block Header image

AWS Storage Optimization: Best Practices for Cost and Performance

Simple Block Header image

How the CSI (Container Storage Interface) Works