Scenario · HA & Failover

Old primary returns

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L4 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-08/05-old-primary-returns

Part of these paths

HA & Failover On-Call SRE On-Call Path

Show the postmortem & investigation hints spoilers

Old primary returns
Type: incident simulation · Topic: HA & Failover · Level: L4 · Duration: 10–15 min
Launch: ride postgres start stage-08/05-old-primary-returns

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: after a failover (standby promoted), the old primary came back online.
A returned old primary is stale and dangerous — if anything writes to it, its data
diverges from the promoted primary, undoing the failover.

How it was found: the old primary was reachable and writable again; the promoted
node was the real current primary, confirmed by the failover markers in the
critical databases.

The mitigation: keep the old primary fenced (stopped / not a write target) and
validate the critical databases (app_db, billing_db) on the promoted primary.

Lesson: a returned old primary must be fenced until it's safely rejoined as a
standby — never used for writes. Validate critical data on the new primary, not the
old one. Writing to the old primary, or adding an index, is wrong.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. A failover already happened — the standby was promoted — and now the OLD primary is back up. Don't trust it. Identify the real, current primary and confirm the old one isn't a write target. Check pg_is_in_recovery() and the failover markers on each node.
2. The returned old primary is stale and must not accept writes (it would diverge from the promoted primary). Fence it: `pgpg action <session> fence-old-primary`.
3. After fencing, validate the critical databases ON THE PROMOTED primary (not the old one): \connect billing_db then SELECT * FROM failover_markers; Don't write to the old primary and don't add an index.

Start now →← All scenarios