Scenario · Locks & Transactions
Serializable retry storm (SQLSTATE 40001)
A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.
L4 · 10–15 min · runs locally in Docker
Launch
Start this scenario
Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.
ride postgres start stage-02/08-serializable-retry-stormPart of these paths
Show the postmortem & investigation hints spoilers
Serializable retry storm (SQLSTATE 40001) Type: incident simulation · Topic: Locks & Transactions · Level: L4 · Duration: 10–15 min Launch: ride postgres start stage-02/08-serializable-retry-storm POSTMORTEM (root cause · how it was found · the fix · lesson) Root cause: the database default isolation was set to SERIALIZABLE, and a hot path ran conflicting transactions on the same rows. Under SERIALIZABLE, PostgreSQL aborts conflicting transactions with `could not serialize access` (SQLSTATE 40001). Without retry handling, those aborts became an error storm. It is not a deadlock and not a missing index. How it was found: the errors carried SQLSTATE 40001 (serialization failure, not 40P01 deadlock); SHOW default_transaction_isolation revealed SERIALIZABLE. The mitigation: serialize the contended path deterministically — take explicit row locks in a consistent order so transactions queue instead of racing: BEGIN; SELECT * FROM accounts WHERE id IN (1,2) ORDER BY id FOR UPDATE; ... writes ... COMMIT; Lesson: SERIALIZABLE shifts the burden to the application — every transaction must be retried on 40001. Either implement retry-with-backoff, or remove the contention (explicit locking / a less strict isolation where correct). Don't treat 40001 as a deadlock or an indexing problem. INVESTIGATION HINTS (the staged path to diagnose and fix) 1. The errors say 'could not serialize access ... (SQLSTATE 40001)'. That's a serialization failure, not a deadlock and not a missing index. Check the isolation level: SHOW default_transaction_isolation. 2. The database is forcing SERIALIZABLE, so conflicting transactions on the same rows abort with 40001. The production fix is retry-with-backoff; the immediate mitigation is to serialize the hot path deterministically. 3. Take explicit row locks in a consistent order so the transactions queue instead of racing: SELECT ... WHERE id IN (...) ORDER BY id FOR UPDATE, then do the writes and COMMIT.