PostgreSQL backup strategy on Kubernetes breaks in a specific and predictable way: teams configure storage snapshots, confirm that snapshots are being taken on schedule, and then discover during an incident that the restored database either fails to start or is missing data that should have been captured. The snapshots were real. The data loss was also real. The problem is the gap between storage-level consistency and database-level consistency.
This post covers what application-consistent backup actually means for PostgreSQL, why storage snapshots alone are not sufficient, how Kubernetes operators close the gap, and how to build a recovery model that holds up when it matters.
Crash-Consistent vs. Application-Consistent
A crash-consistent snapshot captures the state of storage at a single point in time. Every block written to disk before that moment is included; nothing written after. From a storage perspective, this is complete and correct.
From a PostgreSQL perspective, crash-consistent is exactly what it sounds like: the same state you would get if you pulled the power cable mid-operation. PostgreSQL is designed to survive this — it has a recovery mechanism built around its write-ahead log — but the recovery may replay uncommitted transactions, may discard changes that were acknowledged to clients, and always requires a recovery pass before the database can serve queries. In the best case this is a minor nuisance. In the worst case, with a large shared_buffers pool and recent write activity, the recovery window adds meaningful minutes to your RTO.
An application-consistent snapshot is one that PostgreSQL can use as a clean starting point. The database has flushed dirty pages to disk, written a checkpoint record to the WAL, and is in a state where recovery is minimal or unnecessary. This is the difference between “a snapshot we hope PostgreSQL can recover from” and “a snapshot we know PostgreSQL can start from.”
Why Storage Snapshots Alone Are Not Application-Consistent
The core issue is write ordering and checkpoint state. PostgreSQL manages its buffer pool in memory. At any given moment, pages modified by committed transactions may not yet be written to the data files on disk — they live in shared_buffers and will be flushed to disk at the next checkpoint or when memory pressure demands it.
If you take a storage snapshot while PostgreSQL is running normally, you capture a point in time where:
- Some modified pages are in memory, not on disk
- The WAL has records that are not yet reflected in the data files
- The checkpoint record in the control file may not reflect the current consistent state
When PostgreSQL starts from this snapshot, it has to replay WAL from the last valid checkpoint to bring the data files up to date. This works, but it is not application-consistent — it is crash recovery. The WAL replay succeeds if the WAL files are intact and present. If you snapshot the data directory but not the WAL directory at the same instant, or if WAL segments have already been recycled, recovery fails.
What Application-Consistent Means for PostgreSQL
A truly application-consistent PostgreSQL snapshot requires that a CHECKPOINT has completed before the snapshot is taken. A checkpoint flushes all dirty shared buffers to disk and writes a checkpoint record to the WAL. After a checkpoint, the data files are consistent with the WAL up to that point, and PostgreSQL can start cleanly from the snapshot without any WAL replay.
Older PostgreSQL versions (before 15) used pg_start_backup() and pg_stop_backup() functions to bracket a backup window. pg_start_backup() forced a checkpoint and wrote a backup label file; pg_stop_backup() closed the window and wrote the stop record to WAL. These functions are still available but have been deprecated in favor of pg_backup_start() / pg_backup_stop() in PostgreSQL 15+. For teams using modern Postgres, the operators handle this automatically — you should not be calling these functions manually in production.
The practical requirement is: coordinate the storage snapshot with the database so that a checkpoint completes immediately before the snapshot is captured. Everything after that point is handled by WAL.
How Kubernetes Operators Handle Application-Consistent Backup
CloudNativePG, Zalando Postgres Operator, and Crunchy Data PGO each implement application-consistent backup, but with different approaches.
CloudNativePG has the most integrated model. It manages the full backup lifecycle — it calls pg_backup_start() on the primary, triggers a CSI VolumeSnapshot via the Kubernetes API, then calls pg_backup_stop() to close the backup window and write the stop record to WAL. The operator knows the relationship between the snapshot and the WAL position, so it can construct a valid base backup that point-in-time recovery can extend from. CloudNativePG also manages WAL archiving to object storage as a first-class feature, which is required for PITR.
Zalando’s Spilo-based operator uses pgBackRest or WAL-E/WAL-G for backup and archiving. It follows the same conceptual model — checkpoint, base backup, WAL archive — but the coordination is at the pgBackRest or WAL-G layer rather than through CSI snapshots. Restores are managed through those tools as well.
Crunchy PGO uses pgBackRest natively and exposes pgBackRest’s full feature set through Kubernetes resources. It supports both filesystem-level backups and CSI snapshot-based backups, with WAL archiving running continuously to object storage.
For teams evaluating operators, the key question is whether the operator owns WAL archiving end-to-end. If WAL archiving is a separate concern that the operator does not manage, teams end up with base backups that cannot be extended to a precise recovery point.
CSI Snapshot Integration
CSI VolumeSnapshots on Kubernetes allow storage-layer snapshots to be triggered through the Kubernetes API. When the storage backend supports it — as simplyblock does — these snapshots are copy-on-write at the storage layer, which means they complete in seconds regardless of volume size. This is a significant operational advantage over backup approaches that copy data out of the cluster.
For PostgreSQL, CSI snapshots become most useful when the operator coordinates them with a checkpoint. The snapshot is fast, but it needs to be taken at the right database state. CloudNativePG’s VolumeSnapshot-based backup does exactly this: it wraps the snapshot in a pg_backup_start() / pg_backup_stop() window so the resulting snapshot is a valid base backup.
Restoring from a CSI snapshot is also faster than restoring from object storage. The data stays within the storage layer, and the volume is available almost immediately. For RTO-sensitive workloads, this matters — the difference between a 10-minute restore from a local snapshot and a 90-minute restore from object storage is significant when a production database is down.
WAL Strategy: Why Continuous Archiving Matters Even With Frequent Snapshots
Even if you take hourly CSI snapshots, WAL archiving is still essential. Snapshots capture a consistent base state, but data written between snapshots is only recoverable if the WAL for that period is available.
WAL archiving continuously ships WAL segments to durable object storage — typically S3 or a compatible endpoint. If a snapshot was taken at 10:00 and an incident occurs at 10:47, recovery without WAL archiving can only get you back to 10:00. With WAL archiving running continuously, you can recover to 10:47 or to any earlier point within that window. This is point-in-time recovery (PITR), and it is the primary mechanism for meeting tight RPO targets.
The practical setup is: snapshots define your base backup interval and drive your fast-restore capability; WAL archiving fills the gaps between snapshots and enables PITR. The two are complementary, not competing.
For a detailed walkthrough of PITR on Kubernetes, see the point-in-time recovery for PostgreSQL on Kubernetes post.
The Restore Workflow: Base Backup Plus WAL Replay
A full PostgreSQL restore follows a consistent sequence regardless of which operator you use:
- Restore the base backup (from a CSI snapshot or object storage) to the data directory.
- Configure
recovery.conf(orpostgresql.confwith recovery parameters in modern Postgres) to point at the WAL archive and specify the target recovery time or LSN. - Start PostgreSQL. It replays WAL from the archive until it reaches the target point, then promotes to read-write.
The base backup establishes the consistent starting state. WAL replay carries the database forward to the precise recovery target. Both are required; neither is optional.
Teams should document the specific recovery parameters for each database deployment — which object storage bucket, which credentials, which target time format — in runbooks that are accessible during incidents. Discovering that you need to look up the WAL archive configuration while the database is down adds unnecessary time to the recovery window.
Testing Consistency After Restore
A successful restore is not the same as a correct restore. Teams need application-level validation after every restore drill.
Minimum checks: PostgreSQL starts without entering an error state, the cluster is in read-write mode (not still in recovery), recent transactions are present, and application-level queries return expected results. For databases that serve APIs, run representative queries and verify that response counts and values match pre-incident expectations.
Restore drills should be scheduled for critical databases at least quarterly. The drill should test the full sequence — triggering the restore, running through the recovery configuration, validating consistency — in an isolated environment that does not affect production. Teams that skip drills discover gaps during real incidents.
RPO Considerations: Snapshot Frequency vs. WAL Archive Frequency
Snapshot frequency and WAL archive frequency drive different parts of your recovery envelope. Snapshot frequency determines how quickly you can restore to a baseline state without replaying long WAL chains. WAL archive frequency determines your maximum data loss window.
For a database with an aggressive RPO — say, 5 minutes — you need WAL archiving to run continuously (most implementations flush WAL every 60 seconds by default). Snapshot frequency for this database might be hourly, because the WAL fill the gaps. For a database with a 4-hour RPO, less frequent WAL archiving may be acceptable, though continuous archiving is generally recommended regardless.
Snapshot frequency also affects CSI storage capacity and snapshot management overhead. Simplyblock’s copy-on-write snapshots keep the overhead low — snapshots only consume space for changed blocks — which makes frequent snapshots practical without capacity blowup.
Questions and Answers
What is the difference between crash-consistent and application-consistent PostgreSQL backup?
A crash-consistent snapshot captures storage state without coordinating with PostgreSQL first. The database can usually recover from this, but it requires WAL replay and may have lost in-flight data. An application-consistent snapshot forces a checkpoint before capture, so PostgreSQL’s data files are fully flushed and recovery is clean or minimal. Application-consistent is what you want for predictable, fast recovery.
Why does WAL archiving matter if you already have frequent CSI snapshots?
Snapshots give you fast restores to fixed points in time. WAL archiving fills the gaps between snapshots and enables point-in-time recovery to any moment within the archive window. Without WAL archiving, your recovery granularity is limited to snapshot intervals — potentially hours of data loss. With both, you get fast baseline restores plus precise recovery.
How do CloudNativePG and other operators make snapshots application-consistent?
They coordinate with PostgreSQL directly. Before triggering a CSI VolumeSnapshot, the operator calls pg_backup_start() (or its equivalent) to force a checkpoint and write a backup label. After the snapshot completes, it calls pg_backup_stop() to write the stop record to WAL. The result is a snapshot that is a valid PostgreSQL base backup, not just a storage-layer point-in-time capture.
How often should restore drills be run for production PostgreSQL databases?
At minimum quarterly for critical databases, and ideally monthly for high-value systems. Each drill should test the full restore sequence — restoring the base backup, replaying WAL to a target point, and validating data correctness with application-level checks — in an isolated environment. Drills that only verify backup creation without testing recovery do not validate that you can actually recover.