Scenario · HA & Failover
Old primary returns
A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.
L4 · 10–15 min · runs locally in Docker
Launch
Start this scenario
Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.
ride postgres start stage-08/05-old-primary-returnsPart of these paths
Show the postmortem & investigation hints spoilers
Old primary returns Type: incident simulation · Topic: HA & Failover · Level: L4 · Duration: 10–15 min Launch: ride postgres start stage-08/05-old-primary-returns POSTMORTEM (root cause · how it was found · the fix · lesson) Root cause: after a failover (standby promoted), the old primary came back online. A returned old primary is stale and dangerous — if anything writes to it, its data diverges from the promoted primary, undoing the failover. How it was found: the old primary was reachable and writable again; the promoted node was the real current primary, confirmed by the failover markers in the critical databases. The mitigation: keep the old primary fenced (stopped / not a write target) and validate the critical databases (app_db, billing_db) on the promoted primary. Lesson: a returned old primary must be fenced until it's safely rejoined as a standby — never used for writes. Validate critical data on the new primary, not the old one. Writing to the old primary, or adding an index, is wrong. INVESTIGATION HINTS (the staged path to diagnose and fix) 1. A failover already happened — the standby was promoted — and now the OLD primary is back up. Don't trust it. Identify the real, current primary and confirm the old one isn't a write target. Check pg_is_in_recovery() and the failover markers on each node. 2. The returned old primary is stale and must not accept writes (it would diverge from the promoted primary). Fence it: `pgpg action <session> fence-old-primary`. 3. After fencing, validate the critical databases ON THE PROMOTED primary (not the old one): \connect billing_db then SELECT * FROM failover_markers; Don't write to the old primary and don't add an index.