Scenario · HA & Failover
Backup before rejoin
A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.
L3 · 10–15 min · runs locally in Docker
Launch
Start this scenario
Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.
ride postgres start stage-08/09-backup-before-rejoinPart of these paths
Show the postmortem & investigation hints spoilers
Backup before rejoin
Type: incident simulation · Topic: HA & Failover · Level: L3 · Duration: 10–15 min
Launch: ride postgres start stage-08/09-backup-before-rejoin
POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: a risky operation (rebuilding/rejoining the old primary) was about to
run with no fresh recovery point for the promoted primary. If the rebuild went
wrong, there'd be nothing to recover the current source of truth from.
How it was found: the backup destination held no dump of the critical databases;
the promoted primary was the only good copy.
The mitigation: take (and verify) a complete backup covering every critical
database before touching the old primary.
Lesson: before any dangerous HA/rejoin/rebuild step, secure a recovery point of the
current promoted primary — covering ALL critical databases. "Fix the cluster" comes
after "make sure we can recover." An index is irrelevant.
INVESTIGATION HINTS (the staged path to diagnose and fix)
1. Before rebuilding/rejoining the old primary, make sure you can recover. Which databases must be in the safety backup? SELECT datname FROM pg_database WHERE datname NOT IN ('postgres','template0','template1','incident'); and what's in the destination? SELECT * FROM pg_ls_dir('/tmp/pgpg_backup'); — nothing yet.
2. The promoted primary is the source of truth. Take a complete backup of the critical databases BEFORE any risky rejoin/rebuild.
3. Take the pre-rejoin backup: `pgpg action <session> take-pre-rejoin-backup` (dumps every critical database). Then app_db.sql and billing_db.sql exist. Don't rejoin before backing up, and don't add an index.