Scenario · Vacuum & Bloat
Unsafe VACUUM FULL capstone
A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.
L3 · 10–15 min · runs locally in Docker
Launch
Start this scenario
Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.
ride postgres start stage-07/10-unsafe-vacuum-full-capstonePart of these paths
Show the postmortem & investigation hints spoilers
Unsafe VACUUM FULL capstone Type: incident simulation · Topic: Vacuum & Bloat · Level: L3 · Duration: 10–15 min Launch: ride postgres start stage-07/10-unsafe-vacuum-full-capstone POSTMORTEM (root cause · how it was found · the fix · lesson) Root cause: tenant_b.orders was bloated with dead tuples while live app traffic was running against it. The tempting reflex — VACUUM FULL — would have rewritten the table under an ACCESS EXCLUSIVE lock, blocking every reader and writer and turning a slow table into an outage. How it was found: pg_stat_user_tables showed the bloat; pg_stat_activity showed active app sessions on the same table. The mitigation: a plain VACUUM ANALYZE reclaimed the dead space for reuse without blocking traffic; the deeper fix (reducing churn, scheduling pg_repack) belongs in a maintenance window. Lesson: VACUUM FULL is a planned-maintenance tool, not a live-incident reflex. A plain VACUUM frees space for reuse without heavy locks; VACUUM FULL (and friends like CLUSTER) take ACCESS EXCLUSIVE — schedule them, or use pg_repack / REINDEX CONCURRENTLY. An index or the wrong database doesn't help. INVESTIGATION HINTS (the staged path to diagnose and fix) 1. tenant_b.orders is bloated and there's live app traffic on it. Check both: \connect tenant_b then SELECT relname, n_live_tup, n_dead_tup, pg_size_pretty(pg_total_relation_size(relid)) FROM pg_stat_user_tables WHERE relname='orders'; and SELECT pid, state, application_name, query FROM pg_stat_activity WHERE datname='tenant_b'; 2. The reflex is VACUUM FULL — but it takes an ACCESS EXCLUSIVE lock and would block all that live traffic. During an incident, reclaim with a plain VACUUM and address the churn; leave VACUUM FULL / pg_repack for a maintenance window. 3. Safe response in tenant_b: VACUUM ANALYZE orders; Do NOT run VACUUM FULL during live traffic, don't add an index, and don't vacuum the wrong database.