Scenario · Compound Incidents
Storage pressure and autovacuum failure
A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.
L4 · 15–20 min · runs locally in Docker
Launch
Start this scenario
Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.
ride postgres start stage-11/10-storage-pressure-and-autovacuum-failurePart of these paths
Show the postmortem & investigation hints spoilers
Storage pressure and autovacuum failure Type: incident simulation · Topic: Compound Incidents · Level: L4 · Duration: 15–20 min Launch: ride postgres start stage-11/10-storage-pressure-and-autovacuum-failure POSTMORTEM (root cause · how it was found · the fix · lesson) Root cause (compound): storage pressure on analytics_db with an underlying cleanup failure. Autovacuum was disabled on `events` and update/delete churn left a large dead-tuple load, so the table kept growing and space was never reclaimed. The storage symptom and the vacuum failure are the same problem — and reaching for VACUUM FULL (a heavy exclusive-lock rewrite) or an index is the wrong move. app_db was a healthy decoy. How it was found: pg_stat_user_tables showed high n_dead_tup with no recent autovacuum; pg_class.reloptions showed autovacuum_enabled = false on events. The fix (correct database, correct tool): \connect analytics_db ALTER TABLE events RESET (autovacuum_enabled); VACUUM ANALYZE events; Lesson: storage incidents often hide a cleanup problem. Don't just delete files or run VACUUM once and move on — re-enable autovacuum so it keeps reclaiming, and use a plain VACUUM (not VACUUM FULL) during a live incident. Fix the right database; an index doesn't reclaim bloat. INVESTIGATION HINTS (the staged path to diagnose and fix) 1. analytics_db.events is consuming space and degrading. Don't just delete files or VACUUM once — look at table health: SELECT relname, n_live_tup, n_dead_tup, last_autovacuum FROM pg_stat_user_tables; and SELECT relname, reloptions FROM pg_class WHERE relname = 'events'; 2. The bloat IS the storage pressure: autovacuum is disabled on events and churn left a huge dead-tuple load, so cleanup never reclaims space. app_db is a healthy decoy. An index won't help, and VACUUM FULL is the wrong reflex (heavy lock). 3. Re-enable autovacuum and run a plain VACUUM: \connect analytics_db; ALTER TABLE events RESET (autovacuum_enabled); VACUUM ANALYZE events; Don't VACUUM FULL, don't only add an index, and fix the right database (analytics_db, not app_db).