Scenario · Vacuum & Bloat
Vacuum cost-delay misconfiguration
A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.
L3 · 10–15 min · runs locally in Docker
Launch
Start this scenario
Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.
ride postgres start stage-07/06-vacuum-cost-delay-misconfigurationPart of these paths
Show the postmortem & investigation hints spoilers
Vacuum cost-delay misconfiguration Type: incident simulation · Topic: Vacuum & Bloat · Level: L3 · Duration: 10–15 min Launch: ride postgres start stage-07/06-vacuum-cost-delay-misconfiguration POSTMORTEM (root cause · how it was found · the fix · lesson) Root cause: analytics_db.events had per-table autovacuum cost settings (autovacuum_vacuum_cost_delay high, autovacuum_vacuum_cost_limit tiny) that throttled autovacuum to a crawl. Autovacuum was enabled and 'running', but so slow it never kept up with churn, so dead tuples accumulated. How it was found: the table's reloptions showed the throttling cost settings; pg_stat_user_tables showed dead tuples growing despite autovacuum being on. The mitigation: reset the bad per-table cost settings so autovacuum runs at a normal rate, and run a manual VACUUM ANALYZE now to clear the backlog (manual VACUUM ignores the autovacuum cost settings). Lesson: "vacuum is on" isn't "vacuum is keeping up." Distinguish missing/disabled vacuum from vacuum throttled by cost settings. Fix the configuration, then vacuum. Disabling autovacuum, adding an index, or vacuuming the wrong database is wrong. INVESTIGATION HINTS (the staged path to diagnose and fix) 1. Autovacuum is enabled on analytics_db.events, but dead tuples still pile up. Check the table's options: \connect analytics_db then SELECT relname, reloptions FROM pg_class WHERE relname='events'; it has autovacuum_vacuum_cost_delay/cost_limit set to throttle autovacuum to a crawl. 2. This isn't 'autovacuum disabled' — it's autovacuum configured too slow. The per-table cost settings make it never keep up. Manual VACUUM ignores those settings, so it's your immediate cleanup lever. 3. Reset the bad cost settings and vacuum: \connect analytics_db then ALTER TABLE events RESET (autovacuum_vacuum_cost_delay, autovacuum_vacuum_cost_limit); VACUUM ANALYZE events; Don't disable autovacuum, don't add an index, don't vacuum the wrong database.