Scenario · Replication & WAL

Synchronous replication latency

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L4 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-04/07-synchronous-replication-latency

Part of these paths

Replication & WAL

Show the postmortem & investigation hints spoilers

Synchronous replication latency
Type: incident simulation · Topic: Replication & WAL · Level: L4 · Duration: 10–15 min
Launch: ride postgres start stage-04/07-synchronous-replication-latency

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: synchronous replication (synchronous_commit = remote_apply, the
standby required) meant every primary commit had to wait for the standby to
apply it. The replica had stopped applying WAL, so it could never acknowledge,
and commits on the primary blocked indefinitely (wait_event = SyncRep).

How it was found: pg_stat_activity on the primary showed commits waiting on
SyncRep; pg_stat_replication showed the standby as sync but not advancing;
pg_is_wal_replay_paused on the replica was true.

The mitigation: resume replay on the replica so it applies and acknowledges; the
blocked commits then completed.

Lesson: synchronous replication couples commit latency to standby health. When
commits hang on SyncRep, check the standby first. Blanking
synchronous_standby_names unblocks writes but silently drops the durability
guarantee — do it only as a deliberate, understood tradeoff, not a reflex; and
killing writers or adding indexes does nothing.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. Writes on the primary hang. It's not a lock on a row — commits are waiting on synchronous replication. On the PRIMARY: SELECT pid, wait_event_type, wait_event, query FROM pg_stat_activity WHERE wait_event = 'SyncRep'; and SELECT application_name, sync_state, replay_lag FROM pg_stat_replication;
2. synchronous_commit is remote_apply and the standby is required, but the REPLICA has stopped applying WAL — so it can never acknowledge, and every commit blocks. Check the replica: SELECT pg_is_wal_replay_paused();
3. Resume replay on the REPLICA (SELECT pg_wal_replay_resume();). The standby applies and acknowledges, and the blocked commits complete. Don't just blank out synchronous_standby_names (that trades away durability) and don't kill the writers or add indexes.

Start now →← All scenarios