← All scenarios

Scenario · Vacuum & Bloat

Unsafe VACUUM FULL capstone

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L3 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-07/10-unsafe-vacuum-full-capstone

Part of these paths

Show the postmortem & investigation hints spoilers
Unsafe VACUUM FULL capstone
Type: incident simulation · Topic: Vacuum & Bloat · Level: L3 · Duration: 10–15 min
Launch: ride postgres start stage-07/10-unsafe-vacuum-full-capstone

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: tenant_b.orders was bloated with dead tuples while live app traffic was
running against it. The tempting reflex — VACUUM FULL — would have rewritten the
table under an ACCESS EXCLUSIVE lock, blocking every reader and writer and turning
a slow table into an outage.

How it was found: pg_stat_user_tables showed the bloat; pg_stat_activity showed
active app sessions on the same table.

The mitigation: a plain VACUUM ANALYZE reclaimed the dead space for reuse without
blocking traffic; the deeper fix (reducing churn, scheduling pg_repack) belongs in
a maintenance window.

Lesson: VACUUM FULL is a planned-maintenance tool, not a live-incident reflex. A
plain VACUUM frees space for reuse without heavy locks; VACUUM FULL (and friends
like CLUSTER) take ACCESS EXCLUSIVE — schedule them, or use pg_repack /
REINDEX CONCURRENTLY. An index or the wrong database doesn't help.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. tenant_b.orders is bloated and there's live app traffic on it. Check both: \connect tenant_b then SELECT relname, n_live_tup, n_dead_tup, pg_size_pretty(pg_total_relation_size(relid)) FROM pg_stat_user_tables WHERE relname='orders'; and SELECT pid, state, application_name, query FROM pg_stat_activity WHERE datname='tenant_b';
2. The reflex is VACUUM FULL — but it takes an ACCESS EXCLUSIVE lock and would block all that live traffic. During an incident, reclaim with a plain VACUUM and address the churn; leave VACUUM FULL / pg_repack for a maintenance window.
3. Safe response in tenant_b: VACUUM ANALYZE orders; Do NOT run VACUUM FULL during live traffic, don't add an index, and don't vacuum the wrong database.