work_mem Spill: the Sort Goes to Disk
A production incident, replayed by hand: a per-tenant listing got slow while CPU stayed calm and temp files piled up. Bring up the same database, catch the query with pg_stat_statements, read the Sort node in the plan, and make the spill disappear with an order-providing index.
A tenant's order history endpoint turned sluggish, the disks got busy, and the database started writing temp files — yet CPU stayed almost idle and no index seemed to be missing. This book replays the incident on a database you bring up yourself: the schema, the data and the deliberately small work_mem are all in plain files, nothing is hidden. You reproduce the slow query with a stopwatch, find it in pg_stat_statements, and read the one line in EXPLAIN that explains everything: Sort Method: external merge Disk. Then you prove the diagnosis with a session-scoped work_mem bump, learn why the global version of that fix is a trap, and remove the sort entirely with an index that already delivers the order. The diagnosis path is the real one you'd use on call.
What you'll build
- Reproduce a sort-spill incident on a local PostgreSQL stand
- Find the offending query with pg_stat_statements
- Read a Sort node: external merge on disk vs in-memory sort
- Prove a work_mem diagnosis with a session-scoped SET, safely
- Remove the Sort node entirely with an order-providing index
Contents
- The stand
- Bring it up
- The schema
- Apply the schema
- The data
- Load it
- Know your config, then a clean slate
- Reproduce the pain
- Let the traffic flow
- Ask the database what hurts
- Read the Sort node
- Prove it with memory
- Design the fix
- Apply it
- Watch the Sort node disappear
- Close the ticket