Scenario · HA & Failover
Manual replica promotion
A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.
L3 · 10–15 min · runs locally in Docker
Launch
Start this scenario
Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.
ride postgres start stage-08/02-manual-replica-promotionPart of these paths
Show the postmortem & investigation hints spoilers
Manual replica promotion Type: incident simulation · Topic: HA & Failover · Level: L3 · Duration: 10–15 min Launch: ride postgres start stage-08/02-manual-replica-promotion POSTMORTEM (root cause · how it was found · the fix · lesson) Root cause: the primary was lost for good, so the streaming standby had to be promoted to a new writable primary. A failover isn't done when the command runs — it's done when the promoted node is writable AND the critical databases are verified intact. How it was found: pg_is_in_recovery() on the standby confirmed it was a promotion candidate (caught up before the crash); after promotion it returned false; the failover markers in app_db and billing_db confirmed the critical data survived. The mitigation: promote the standby (`pgpg action promote-replica`), confirm it's out of recovery, and verify the markers in the critical (non-default) databases. Lesson: promotion changes which node is writable — verify pg_is_in_recovery() = false on the new primary and validate the databases your app actually uses, not just the default one. An index is irrelevant. INVESTIGATION HINTS (the staged path to diagnose and fix) 1. The primary is gone and won't come back — you need to fail over. First confirm the standby is a good candidate: on the replica, SELECT pg_is_in_recovery(); is true (a standby). The standby had caught up before the crash. 2. Promote the standby: `pgpg action <session> promote-replica`. Then SELECT pg_is_in_recovery(); becomes false — it's now a writable primary. 3. After promotion, validate the CRITICAL databases, not just the default one: \connect billing_db then SELECT * FROM failover_markers; (and app_db). Don't add an index and don't only check the default database.