Scenario · Replication & WAL
Broken replication credentials
A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.
L3 · 10–15 min · runs locally in Docker
Launch
Start this scenario
Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.
ride postgres start stage-04/05-broken-replication-credentialsPart of these paths
Show the postmortem & investigation hints spoilers
Broken replication credentials Type: incident simulation · Topic: Replication & WAL · Level: L3 · Duration: 10–15 min Launch: ride postgres start stage-04/05-broken-replication-credentials POSTMORTEM (root cause · how it was found · the fix · lesson) Root cause: the role the standby uses to stream (`replicator`) was locked out (NOLOGIN) and the live connection dropped, so the walreceiver's reconnect attempts failed authentication. The primary kept serving traffic, but no standby was attached — replication was down and the replica fell behind/stale. How it was found: pg_stat_replication on the primary was empty; on the replica pg_stat_wal_receiver showed no healthy receiver and the logs showed the role could not log in. The mitigation: restore the role's LOGIN privilege (ALTER ROLE replicator LOGIN). The walreceiver reconnected and streaming resumed. Lesson: "primary healthy but no standby" is a connectivity/auth problem — check pg_stat_replication and pg_stat_wal_receiver and the role/pg_hba/primary_conninfo that streaming uses. Don't drop replication slots, add indexes, or rebuild the whole replica for an auth fix. INVESTIGATION HINTS (the staged path to diagnose and fix) 1. The replica stopped streaming. On the PRIMARY: SELECT * FROM pg_stat_replication; — it's empty, no standby connected. The primary is healthy; the replica just can't attach. 2. On the REPLICA: SELECT * FROM pg_stat_wal_receiver; — no active receiver, and the logs show the replication role can't log in. This is an auth/credentials problem, not slots or queries. 3. Restore the replication role's ability to log in on the PRIMARY: ALTER ROLE replicator LOGIN; the standby's walreceiver reconnects on its own. Don't drop slots, add indexes, or rebuild the replica.