Scenario · Replication & WAL

Read replica stale reads

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L2 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-04/03-read-replica-stale-reads

Part of these paths

Replication & WAL

Show the postmortem & investigation hints spoilers

Read replica stale reads
Type: incident simulation · Topic: Replication & WAL · Level: L2 · Duration: 10–15 min
Launch: ride postgres start stage-04/03-read-replica-stale-reads

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: the application wrote to the primary and immediately read from the
replica, but WAL replay on the replica was lagging, so the read returned an
older snapshot. The data was never wrong — the replica just hadn't applied the
latest WAL yet. This is read-after-write consistency under replication lag, not
a transaction-isolation or query problem.

How it was found: the primary's pg_current_wal_lsn was well ahead of the
replica's pg_last_wal_replay_lsn, and a row count on the replica trailed the
primary while replay was stalled.

The mitigation: resume replay so the replica catches up; the read then returned
current data.

Lesson: route read-after-write to the primary (or wait for the replica's
replay_lsn to reach the write's LSN) for consistency-sensitive reads. Don't
write to a read-only replica and don't treat stale reads as an isolation or
indexing issue.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. The app reads from the replica right after writing to the primary and sees old data. That's read-after-write inconsistency from replication lag — not a transaction-isolation bug. Compare the two ends.
2. On the PRIMARY: SELECT pg_current_wal_lsn(); and count the churn rows. On the REPLICA: SELECT pg_is_in_recovery(); SELECT pg_last_wal_replay_lsn(); the replica's replay LSN is stuck behind the primary, so its reads are stale.
3. Let the replica catch up: SELECT pg_wal_replay_resume(); on the replica. Don't try to UPDATE the replica (it's read-only) and don't add indexes — the data is correct, just not yet replayed.

Start now →← All scenarios