← All scenarios

Scenario · Compound Incidents

Vacuum starvation and slow queries

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L4 · 15–20 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-11/05-vacuum-starvation-and-slow-queries

Part of these paths

Show the postmortem & investigation hints spoilers
Vacuum starvation and slow queries
Type: incident simulation · Topic: Compound Incidents · Level: L4 · Duration: 15–20 min
Launch: ride postgres start stage-11/05-vacuum-starvation-and-slow-queries

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause (compound): queries over `tenant_b.orders` were slow — but not for lack of
an index. Autovacuum had been disabled on the table and update/delete churn left a
large dead-tuple/bloat load, so even simple scans got slow. Reaching for an index (or
VACUUM FULL) is the wrong move; the table's health is the problem. `tenant_a` was a
healthy decoy.

How it was found: EXPLAIN showed the cost was in scanning a bloated heap;
pg_stat_user_tables showed high n_dead_tup with no recent autovacuum; pg_class.
reloptions showed autovacuum_enabled = false on tenant_b.orders.

The fix (correct database, correct tool):
  \connect tenant_b
  ALTER TABLE orders RESET (autovacuum_enabled);
  VACUUM ANALYZE orders;

Lesson: not every slow query is an indexing problem — sometimes the table degraded.
Check table health (dead tuples, autovacuum) alongside the plan. Fix the right
database, re-enable autovacuum and run a plain VACUUM ANALYZE; don't paper over bloat
with an index, and don't VACUUM FULL during a live incident.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. Queries over tenant_b.orders are slow. Don't assume it's a missing index — check table health too: EXPLAIN the query AND SELECT relname, n_live_tup, n_dead_tup, last_autovacuum FROM pg_stat_user_tables. tenant_a is healthy (a decoy).
2. tenant_b.orders is bloated with dead tuples and autovacuum is disabled on it (check pg_class.reloptions). The slowness is table health, not indexing — an index won't fix bloat, and VACUUM FULL is the wrong reflex during an incident.
3. On tenant_b: ALTER TABLE orders RESET (autovacuum_enabled); VACUUM ANALYZE orders; Fix the RIGHT database (tenant_b, not tenant_a), don't just add an index, and don't VACUUM FULL.