WAL Archiving and Point-in-Time Recovery
Understand WAL as a change journal, enable archive-mode, run pg_basebackup, then perform a real Point-in-Time Recovery — restoring a database to the moment before a destructive DELETE.
Every production Postgres needs more than a nightly dump. This course builds the full safety net from scratch: we turn on WAL archiving so every committed change lands in a durable archive directory, snapshot the live cluster with pg_basebackup, then stage a real disaster — an accidental DELETE that wipes the table — and use Point-in-Time Recovery to rewind the database to the exact second before it happened. Along the way we read LSNs, understand what archive_command does on every segment switch, and learn why a base backup alone is not enough without a complete WAL chain. The recovery mechanics — restore_command, recovery_target_time, recovery.signal, and promote — are exercised on real data, not slides.
What you'll build
- Explain what WAL is and why it enables crash recovery and archiving
- Read pg_current_wal_lsn() and track LSN movement across commits
- Configure wal_level, archive_mode, and archive_command in postgresql.conf
- Verify that WAL segments land in the archive directory after pg_switch_wal()
- Create a replication role and run pg_basebackup with streaming WAL
- Understand the directory layout a base backup produces
- Perform a complete PITR: recovery.signal, restore_command, recovery_target_time, promote
- Distinguish recovery_target_time, recovery_target_name, recovery_target_xid, and recovery_target_lsn
Contents
- A primary with a mounted config and an archive
- The server configuration we own
- Access rules for the stand
- Start the server
- The table that proves recovery works
- Connect with psql
- Create the table
- Fifty thousand orders, deterministically
- Load the initial data
- Three questions about the WAL
- The WAL before archiving is enabled
- Every commit moves the LSN
- The LSN after five more commits
- Three lines that enable the archive
- Restart to apply the new GUC values
- Archiving is now live
- Force the first segment to archive
- The first segment is in the archive
- Open the replication protocol
- Create the replication role
- The backup script
- Take the base backup
- What the base backup contains
- The LSN before the additional writes
- The writes we want to recover to
- Add the second batch and note the time
- Fifty-five thousand rows, all present
- The destructive operation
- Archive the segment that contains the DELETE
- The archive holds the complete history
- The recovery script
- What recovery.signal does
- Perform the recovery
- The data is back
- The archiver's ledger
- Four ways to pin the recovery target
- Reading the WAL directly
- Timeline 2 on the recovered instance
- What we built
- A two-query health check
- Try it yourself: recover to the original 50 000