← All scenarios

Scenario · Compound Incidents

Backup restore under pressure

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L4 · 15–20 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-11/08-backup-restore-under-pressure

Part of these paths

Show the postmortem & investigation hints spoilers
Backup restore under pressure
Type: incident simulation · Topic: Compound Incidents · Level: L4 · Duration: 15–20 min
Launch: ride postgres start stage-11/08-backup-restore-under-pressure

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause (compound): a recovery under pressure that wasn't actually validated.
Two independent gaps: there was no complete multi-database backup (billing_db wasn't
covered), and the restore_target still held stale data (its db_identity read 'stale',
not the app_db snapshot it was supposed to contain). "Running a restore" isn't done
until the right data is in the right target and every critical database is covered.

How it was found: pg_ls_dir('/tmp/pgpg_backup') showed an incomplete backup;
db_identity in restore_target was stale; comparing markers across app_db / billing_db
/ restore_target revealed both gaps.

The fix (both, via the safe action layer):
  pgpg action take-complete-multidb-backup   -- cover app_db AND billing_db
  pgpg action restore-app-to-target          -- put app_db's snapshot in restore_target

Lesson: a restore is not successful until the restored target and all critical
databases are validated. Don't trust that the source looks fine, don't validate only
one database, and don't reach for an index — verify coverage and the restored target.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. You're recovering under pressure. Don't just 'run a restore' — verify two things: the backup covers every critical database, and the restore landed the RIGHT data in the restore target. Check current_database(), db_identity in each database, and pg_ls_dir('/tmp/pgpg_backup').
2. Two gaps: there's no complete multi-db backup yet (app_db + billing_db must both be covered), and restore_target holds stale data (db_identity says 'stale', not 'app_db'). Validating the source database alone proves nothing.
3. Take a complete backup and restore the right snapshot: `pgpg action take-complete-multidb-backup` then `pgpg action restore-app-to-target`. Don't validate only the source or only app_db, and don't add an index.