← All scenarios

Scenario · Storage & Backup

pg_wal disk pressure

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L2 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-05/02-pg-wal-disk-pressure

Part of these paths

Show the postmortem & investigation hints spoilers
pg_wal disk pressure
Type: incident simulation · Topic: Storage & Backup · Level: L2 · Duration: 10–15 min
Launch: ride postgres start stage-05/02-pg-wal-disk-pressure

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: a write-heavy workload generated WAL faster than checkpoints could
recycle it, so pg_wal grew and put the disk under pressure. The database kept
working — the danger was the operator's temptation to delete WAL files by hand
(which corrupts the cluster and breaks replication/PITR).

How it was found: pg_ls_waldir() showed pg_wal large and growing; pg_stat_wal /
a moving pg_current_wal_lsn confirmed heavy WAL generation; pg_stat_activity
showed the writer.

The mitigation: stop the runaway writer; WAL generation stopped and segments
recycled normally.

Lesson: pg_wal pressure is a workload (or retention) problem — find and stop the
producer, size max_wal_size, and check for stale slots/failed archiving. Never
`rm` files in pg_wal; let PostgreSQL recycle them.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. pg_wal is growing and disk is under pressure, but the database is healthy. Size it: SELECT count(*), pg_size_pretty(sum(size)) FROM pg_ls_waldir(); and watch pg_current_wal_lsn move. A write-heavy workload is generating WAL faster than it can be recycled.
2. Find the writer: SELECT pid, application_name, state FROM pg_stat_activity WHERE application_name LIKE 'wal_generator%'; and look at SELECT * FROM pg_stat_wal;
3. Stop the runaway writer: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE application_name LIKE 'wal_generator%'; WAL generation then stops and segments recycle. NEVER delete files from pg_wal by hand, and a CHECKPOINT alone won't help while the writer runs.