← All scenarios

Scenario · Storage & Backup

PITR gap detected

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L3 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-05/09-pitr-gap-detected

Part of these paths

Show the postmortem & investigation hints spoilers
PITR gap detected
Type: incident simulation · Topic: Storage & Backup · Level: L3 · Duration: 10–15 min
Launch: ride postgres start stage-05/09-pitr-gap-detected

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: a base backup existed and WAL archiving reported success, but one WAL
segment was missing from the archive, leaving a gap in the recovery chain. A
point-in-time restore to any target after the gap could never replay through the
hole — so the "backup" was not actually usable for PITR to a recent time.

How it was found: listing the archive showed the WAL segment names were not
contiguous (one missing); pg_stat_archiver confirmed archiving had been working.
The missing segment was still retained in pg_wal.

The mitigation: re-archive the missing WAL from what was still retained in pg_wal
(`pgpg action repair-archive-gap`); the archive became contiguous and PITR could
replay again.

Lesson: a backup is not valid until restore/PITR validation proves it. Monitor
the *continuity* of the archive (gaps are silent), keep enough WAL retained to
re-archive, and never delete archive files. A CHECKPOINT or an index is unrelated
to a recovery-chain gap.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. A base backup exists and archiving is 'on', but PITR validation is failing. Don't trust the backup — validate the chain. Inspect the archive: SELECT f FROM pg_ls_dir('/tmp/pgpg_archive') f WHERE f ~ '^[0-9A-F]{24}$' ORDER BY f; the WAL segment names are not contiguous — one is missing.
2. Cross-check the archiver: SELECT * FROM pg_stat_archiver; archiving succeeded, yet a segment is absent from the destination, so recovery can't replay across the gap to a recent target. The missing segment is still retained in pg_wal.
3. Re-archive the missing WAL from what's still in pg_wal: `pgpg action <session> repair-archive-gap`. The archive becomes contiguous and PITR can replay again. A CHECKPOINT or an index does nothing for a recovery-chain gap, and never delete archive files.