Skip to main content

Velero

Velero is an open-source Kubernetes backup and disaster recovery tool that captures cluster state — API objects, namespaces, and PersistentVolume data — and stores it in object storage for later restore or migration. For platform teams running stateful applications, Velero is the standard mechanism for point-in-time backup, namespace cloning across clusters, and recovery from accidental deletion or cluster failures.

Key Facts Velero
Type Kubernetes backup and restore tool
Protocol CSI VolumeSnapshot integration
Primary use Cluster DR and PVC backup
Open-source Yes, CNCF project

Velero operates through a controller deployed in the cluster and a CLI for triggering and managing backup schedules. It captures Kubernetes API objects (Deployments, Services, PVCs, ConfigMaps, Secrets) and, optionally, the actual PV data. Data backup uses either the CSI VolumeSnapshot API or file-level agents (restic or kopia) depending on what the storage backend supports.

What is Velero: Kubernetes backup tool that snapshots cluster resources and PV data to object storage for restore and DR

How Velero Backs Up Persistent Volumes

Velero supports two distinct paths for PV data:

CSI snapshot path (preferred): Velero triggers a VolumeSnapshot via the Kubernetes CSI snapshot API. The CSI driver creates a storage-level snapshot of the volume, and Velero stores the snapshot metadata and associated API objects in object storage (typically S3-compatible). Restoring creates a new PVC from the snapshot, which the CSI driver expands into a usable volume. This path is fast because the snapshot is taken at the storage layer — it does not stream all data over the network.

File-level path (restic/kopia): When the CSI driver does not support VolumeSnapshots, Velero can use restic or kopia agents running as privileged DaemonSet pods to perform file-level backups of PV mount paths. This approach is slower, more CPU-intensive, and blocks pod progress during backup windows. It works with any storage backend but should be considered a fallback.

The CSI snapshot path requires a functioning CSI external snapshotter in the cluster, including the VolumeSnapshotClass, VolumeSnapshotContent, and VolumeSnapshot CRDs.

Velero vs. Storage Replication vs. Snapshots Alone

A common source of confusion is treating Velero, storage-level replication, and snapshots as interchangeable DR tools. They are not:

MechanismGranularityRPO capabilityRTO capabilityBest for
Velero backupNamespace + PV dataMinutes to hours (schedule-driven)Minutes (restore from object storage)Accidental deletion, cluster migration, DR copy
Storage replication (async)Volume-levelNear-zero (continuous)Fast failover to replicaSite failover, HA across zones
Storage replication (sync)Volume-levelZero (no data loss)Seconds to minutesZero-RPO requirements, financial data
CSI snapshot aloneVolume-level point-in-timeDepends on snapshot scheduleFast (local restore)Rollback, data cloning

Velero complements replication and snapshots. It is the right tool for namespace-level backup, cross-cluster restore, and audit-trail copies in object storage. It is not a replacement for continuous replication when RPO must be near-zero.

🚀 Fast Velero backups with CSI snapshot support Simplyblock’s CSI driver creates storage-level snapshots instantly, making Velero backup windows shorter and restore tests cheaper with thin clones. 👉 CSI snapshot architecture for Kubernetes

Velero and the CSI Snapshot Architecture

For teams using the CSI path, the full flow involves several components:

  1. Velero’s backup controller triggers a VolumeSnapshot object.
  2. The CSI snapshot controller watches for VolumeSnapshot objects and calls the CSI driver’s CreateSnapshot RPC.
  3. The CSI driver creates a storage-level snapshot and reports back the snapshot handle.
  4. The snapshot controller updates the VolumeSnapshotContent with the handle, marking the snapshot ready.
  5. Velero stores the VolumeSnapshotContent metadata in object storage alongside the API object backup.

On restore, Velero creates a new PVC referencing the VolumeSnapshotContent, and the CSI driver reconstructs a volume from the snapshot. Restore speed depends on whether the CSI driver uses thin cloning (instant) or full data copy (minutes to hours for large volumes).

Restoring a StatefulSet with Velero

Restoring a StatefulSet from a Velero backup involves recreating both the API objects (the StatefulSet spec, Services, ConfigMaps) and the associated PVC data. Velero handles this in a single restore operation when the backup includes both layers. Key steps:

  • Velero creates the PVCs first, triggering the CSI driver to restore volumes from snapshots.
  • Once PVCs are bound, Velero creates the StatefulSet, which picks up the existing PVCs by name.
  • Pod startup proceeds normally once volumes are attached.

RTO depends on volume size, snapshot restore mechanism (thin clone vs. full copy), and the application’s own startup time. For databases, a restored StatefulSet may also need recovery log replay after the PVC is mounted.

Velero with Simplyblock

Simplyblock’s CSI driver supports the Kubernetes VolumeSnapshot API, enabling the fast CSI snapshot path for Velero backups. Relevant capabilities:

  • Instant snapshot creation: simplyblock uses copy-on-write snapshots, so CreateSnapshot completes in milliseconds regardless of volume size. This makes Velero backup windows extremely short.
  • Thin clones for restore testing: after a Velero backup, teams can restore to a test namespace using thin clones from the snapshot. The clone shares underlying data blocks with the source, consuming minimal extra space until diverged.
  • NVMe/TCP and NVMe/RoCE transport: snapshot and restore I/O runs over the same high-throughput fabric used for live workload data, keeping restore times predictable.
  • Integration with snapshot vs. clone workflows: platform teams can use simplyblock snapshots directly for fast local rollback while delegating cross-cluster backup copies to Velero.

These glossary entries cover the components and concepts that work alongside Velero in Kubernetes DR architectures.

Questions and Answers

How does Velero back up persistent volumes?

Velero backs up persistent volumes using one of two methods. The preferred method is the CSI VolumeSnapshot path: Velero triggers a VolumeSnapshot, the CSI driver creates a storage-level snapshot, and Velero stores the snapshot metadata in object storage alongside the Kubernetes API objects. The fallback method uses restic or kopia agents to perform file-level backups by reading PV mount paths directly, which is slower and more resource-intensive. The CSI path requires that the storage backend’s CSI driver supports the VolumeSnapshot API.

Does Velero work with CSI drivers?

Yes. Velero has first-class support for CSI VolumeSnapshots via its CSI plugin. When a StorageClass-backed PVC has a matching VolumeSnapshotClass, Velero uses the CSI snapshot path automatically. The cluster must have the VolumeSnapshot CRDs installed and the CSI external-snapshotter controller running. Simplyblock’s CSI driver supports VolumeSnapshots and is fully compatible with Velero’s CSI plugin.

What is the difference between Velero and storage replication?

Velero is a point-in-time backup tool that captures cluster state and PV data at scheduled intervals, storing backups in object storage. Storage replication continuously mirrors data from a source volume to a replica, either synchronously (zero data loss) or asynchronously (near-zero RPO). Velero suits scenarios like accidental deletion recovery and cluster migration; replication suits scenarios requiring fast failover and minimal data loss. The two are complementary — many production setups run both.

How do I restore a Kubernetes StatefulSet with Velero?

Run velero restore create --from-backup <backup-name>. Velero first recreates the PVCs by triggering CSI snapshot restores, waits for volumes to bind, then recreates the StatefulSet and associated API objects. The StatefulSet pods start once PVCs are bound and the volumes are attached. For large volumes, restore time depends on whether the CSI driver performs a thin clone (seconds) or a full data copy. After restore, verify application health, then check replication lag or recovery log state for databases.