Scenario · Storage & Backup
pg_wal disk pressure
A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.
L2 · 10–15 min · runs locally in Docker
Launch
Start this scenario
Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.
ride postgres start stage-05/02-pg-wal-disk-pressurePart of these paths
Show the postmortem & investigation hints spoilers
pg_wal disk pressure Type: incident simulation · Topic: Storage & Backup · Level: L2 · Duration: 10–15 min Launch: ride postgres start stage-05/02-pg-wal-disk-pressure POSTMORTEM (root cause · how it was found · the fix · lesson) Root cause: a write-heavy workload generated WAL faster than checkpoints could recycle it, so pg_wal grew and put the disk under pressure. The database kept working — the danger was the operator's temptation to delete WAL files by hand (which corrupts the cluster and breaks replication/PITR). How it was found: pg_ls_waldir() showed pg_wal large and growing; pg_stat_wal / a moving pg_current_wal_lsn confirmed heavy WAL generation; pg_stat_activity showed the writer. The mitigation: stop the runaway writer; WAL generation stopped and segments recycled normally. Lesson: pg_wal pressure is a workload (or retention) problem — find and stop the producer, size max_wal_size, and check for stale slots/failed archiving. Never `rm` files in pg_wal; let PostgreSQL recycle them. INVESTIGATION HINTS (the staged path to diagnose and fix) 1. pg_wal is growing and disk is under pressure, but the database is healthy. Size it: SELECT count(*), pg_size_pretty(sum(size)) FROM pg_ls_waldir(); and watch pg_current_wal_lsn move. A write-heavy workload is generating WAL faster than it can be recycled. 2. Find the writer: SELECT pid, application_name, state FROM pg_stat_activity WHERE application_name LIKE 'wal_generator%'; and look at SELECT * FROM pg_stat_wal; 3. Stop the runaway writer: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE application_name LIKE 'wal_generator%'; WAL generation then stops and segments recycle. NEVER delete files from pg_wal by hand, and a CHECKPOINT alone won't help while the writer runs.