← All scenarios

Scenario · Locks & Transactions

Hot row contention on a counter

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L3 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-02/09-hot-row-contention

Part of these paths

Show the postmortem & investigation hints spoilers
Hot row contention on a counter
Type: incident simulation · Topic: Locks & Transactions · Level: L3 · Duration: 10–15 min
Launch: ride postgres start stage-02/09-hot-row-contention

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: many clients updated the same row (a counter/balance) at once. Each
UPDATE takes a row lock, so they serialize — throughput collapses and latency
climbs even though there's a perfectly good index. The problem is the data model
and write pattern, not the query plan.

How it was found: pg_stat_activity showed many sessions running the same UPDATE
on the same row, all waiting on Lock; pg_blocking_pids pointed at the row's
current holder.

The immediate mitigation: shed the runaway concurrency — terminate the pile-up
of workers hammering the row so the queue drains.

Lesson: a single hot row can't be fixed with an index or ANALYZE. The real fixes
are architectural: batch updates, shard the counter into N rows and sum them,
use an append-only log you aggregate later, a queue, or optimistic concurrency.
Treat the incident by reducing concurrency; treat the cause by redesigning the
write path.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. Many sessions are running the *same* UPDATE on the *same* row and waiting on Lock. The index is fine — the bottleneck is contention on one hot row.
2. Confirm it with pg_stat_activity / pg_blocking_pids: lots of waiters, one row, one query pattern. This is a data-model problem, not a plan problem.
3. Immediate mitigation: shed the runaway concurrency (terminate the pile-up of workers hammering the row). Long term: batch, shard the counter, or queue the writes — see the postmortem.