← All scenarios

Scenario · Query Performance

work_mem spill: the sort goes to disk

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L3 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-01/05-work-mem-sort-spill

Part of these paths

Show the postmortem & investigation hints spoilers
work_mem spill: the sort goes to disk
Type: incident simulation · Topic: Query Performance · Level: L3 · Duration: 10–15 min
Launch: ride postgres start stage-01/05-work-mem-sort-spill

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: the query sorts a large per-tenant slice by created_at, but the sort
didn't fit in work_mem, so PostgreSQL spilled it to disk as an external merge.
Disk-based sorts are slow and generate temp files — the latency had nothing to
do with a missing lookup index; it was the Sort step.

How it was found: EXPLAIN (ANALYZE) showed 'Sort Method: external merge  Disk:
NNNkB' under the Sort node. That's the signature of a work_mem spill.

The fix: CREATE INDEX idx_orders_tenant_created ON orders (tenant_id, created_at DESC);
The index returns the tenant's rows already ordered by created_at, so the plan
drops the Sort node entirely — no spill, regardless of work_mem. (A scoped
`SET work_mem` for the session/query is a valid stopgap; changing work_mem
globally to a large value is dangerous because every connection can multiply it.)

Lesson: read the Sort Method in EXPLAIN ANALYZE. 'external merge Disk' means a
spill — fix it with an order-providing index or a careful, scoped work_mem bump,
not a global work_mem increase.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. This isn't a missing-index lookup problem. Run EXPLAIN (ANALYZE, BUFFERS) and look at the Sort node — 'Sort Method: external merge  Disk' means the sort spilled to disk because it didn't fit in work_mem.
2. The query sorts many rows for one tenant. You can make the sort disappear with an index that already provides the order, or give the sort more memory — but changing work_mem globally is risky.
3. Create an index that matches the filter + ORDER BY: (tenant_id, created_at DESC). Then the plan reads rows already sorted and there's no Sort node at all.