← All scenarios

Scenario · Migrations & Releases

ALTER TABLE locks production

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L3 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-09/01-alter-table-locks-production

Part of these paths

Show the postmortem & investigation hints spoilers
ALTER TABLE locks production
Type: incident simulation · Topic: Migrations & Releases · Level: L3 · Duration: 10–15 min
Launch: ride postgres start stage-09/01-alter-table-locks-production

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: a release ran an unsafe `ALTER TABLE orders ...` migration against the
production database (app_db). The migration took an ACCESS EXCLUSIVE lock and held
its transaction open, so every production INSERT/UPDATE — which needs ROW EXCLUSIVE
— blocked behind it. The look-alike analytics_db was never involved.

How it was found: pg_stat_activity (filtered to datname = 'app_db') showed a
`migration_runner` session idle in transaction holding the lock, and an `app_writer`
session with wait_event_type = 'Lock'. pg_blocking_pids() tied the waiter to the
migration's pid.

The fix: terminate the migration session (application_name LIKE 'migration_runner%').
Releasing its ACCESS EXCLUSIVE lock lets the queued writes drain immediately.

Lesson: a migration is a write outage waiting to happen if it holds a strong lock on
a hot table. Break heavy DDL into safe phases, set a short `lock_timeout` /
`statement_timeout` on migrations, and schedule the risky ones. When it's already
stuck, end the *migration* (the blocker) — not the app writers — and never reach for
an index: this is a locking/release-discipline incident, not a query problem.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. Production writes to app_db.orders started hanging during a release. Look at pg_stat_activity (datname = 'app_db'): a migration session is holding a lock and a writer is waiting on it.
2. The migration grabbed an ACCESS EXCLUSIVE lock (an unsafe ALTER TABLE) and kept its transaction open. Every INSERT/UPDATE needs ROW EXCLUSIVE, which conflicts — so writes queue behind the migration. pg_blocking_pids() ties the waiter to the migration's pid.
3. End the migration session (application_name LIKE 'migration_runner%'), not the app writers. Once its lock is released, the queued writes drain. This is not an index problem.