Scenario · Vacuum & Bloat
Long transaction prevents cleanup
A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.
L3 · 10–15 min · runs locally in Docker
Launch
Start this scenario
Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.
ride postgres start stage-07/04-long-transaction-prevents-cleanupPart of these paths
Show the postmortem & investigation hints spoilers
Long transaction prevents cleanup Type: incident simulation · Topic: Vacuum & Bloat · Level: L3 · Duration: 10–15 min Launch: ride postgres start stage-07/04-long-transaction-prevents-cleanup POSTMORTEM (root cause · how it was found · the fix · lesson) Root cause: an old REPEATABLE READ transaction (old_snapshot) on tenant_b held a snapshot, pinning the xmin horizon. VACUUM ran but couldn't remove the dead tuples, because that old snapshot might still see the old row versions — so dead tuples stayed high even though VACUUM 'worked'. app_db was healthy. How it was found: pg_stat_activity showed a long-lived backend with an old backend_xmin holding a transaction open; pg_stat_user_tables showed dead tuples not dropping after VACUUM. The mitigation: terminate the old transaction, then VACUUM ANALYZE — now the dead tuples were removable. Lesson: "VACUUM isn't working" is often "an old transaction is blocking cleanup." Find the oldest backend_xmin / long-running transaction and clear it, then vacuum. Killing a random backend, vacuuming the wrong database, or adding an index won't help. INVESTIGATION HINTS (the staged path to diagnose and fix) 1. VACUUM ran on tenant_b but dead tuples didn't drop. The blocker is an old transaction holding a snapshot. \connect tenant_b then SELECT pid, state, backend_xmin, now()-xact_start AS age, query FROM pg_stat_activity WHERE backend_xmin IS NOT NULL ORDER BY xact_start; one session (old_snapshot) is idle-in-transaction with an old xmin. 2. An old REPEATABLE READ transaction pins the xmin horizon, so VACUUM can't remove dead tuples that snapshot might still see. The fix isn't 'more VACUUM' — it's clearing the old transaction. 3. Terminate the old snapshot holder, then vacuum: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE application_name='old_snapshot'; then \connect tenant_b and VACUUM ANALYZE orders; Don't vacuum the wrong database, don't kill a random backend, don't add an index.