← All scenarios

Scenario · Locks & Transactions

Serializable retry storm (SQLSTATE 40001)

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L4 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-02/08-serializable-retry-storm

Part of these paths

Show the postmortem & investigation hints spoilers
Serializable retry storm (SQLSTATE 40001)
Type: incident simulation · Topic: Locks & Transactions · Level: L4 · Duration: 10–15 min
Launch: ride postgres start stage-02/08-serializable-retry-storm

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: the database default isolation was set to SERIALIZABLE, and a hot
path ran conflicting transactions on the same rows. Under SERIALIZABLE,
PostgreSQL aborts conflicting transactions with `could not serialize access`
(SQLSTATE 40001). Without retry handling, those aborts became an error storm.
It is not a deadlock and not a missing index.

How it was found: the errors carried SQLSTATE 40001 (serialization failure, not
40P01 deadlock); SHOW default_transaction_isolation revealed SERIALIZABLE.

The mitigation: serialize the contended path deterministically — take explicit
row locks in a consistent order so transactions queue instead of racing:
  BEGIN;
  SELECT * FROM accounts WHERE id IN (1,2) ORDER BY id FOR UPDATE;
  ... writes ...
  COMMIT;

Lesson: SERIALIZABLE shifts the burden to the application — every transaction
must be retried on 40001. Either implement retry-with-backoff, or remove the
contention (explicit locking / a less strict isolation where correct). Don't
treat 40001 as a deadlock or an indexing problem.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. The errors say 'could not serialize access ... (SQLSTATE 40001)'. That's a serialization failure, not a deadlock and not a missing index. Check the isolation level: SHOW default_transaction_isolation.
2. The database is forcing SERIALIZABLE, so conflicting transactions on the same rows abort with 40001. The production fix is retry-with-backoff; the immediate mitigation is to serialize the hot path deterministically.
3. Take explicit row locks in a consistent order so the transactions queue instead of racing: SELECT ... WHERE id IN (...) ORDER BY id FOR UPDATE, then do the writes and COMMIT.