← All bricks

PostgreSQL · Intermediate

work_mem Spill: the Sort Goes to Disk

A production incident, replayed by hand: a per-tenant listing got slow while CPU stayed calm and temp files piled up. Bring up the same database, catch the query with pg_stat_statements, read the Sort node in the plan, and make the spill disappear with an order-providing index.

A tenant's order history endpoint turned sluggish, the disks got busy, and the database started writing temp files — yet CPU stayed almost idle and no index seemed to be missing. This book replays the incident on a database you bring up yourself: the schema, the data and the deliberately small work_mem are all in plain files, nothing is hidden. You reproduce the slow query with a stopwatch, find it in pg_stat_statements, and read the one line in EXPLAIN that explains everything: Sort Method: external merge Disk. Then you prove the diagnosis with a session-scoped work_mem bump, learn why the global version of that fix is a trap, and remove the sort entirely with an index that already delivers the order. The diagnosis path is the real one you'd use on call.

What you'll build

  • Reproduce a sort-spill incident on a local PostgreSQL stand
  • Find the offending query with pg_stat_statements
  • Read a Sort node: external merge on disk vs in-memory sort
  • Prove a work_mem diagnosis with a session-scoped SET, safely
  • Remove the Sort node entirely with an order-providing index

Contents

  1. The stand
  2. Bring it up
  3. The schema
  4. Apply the schema
  5. The data
  6. Load it
  7. Know your config, then a clean slate
  8. Reproduce the pain
  9. Let the traffic flow
  10. Ask the database what hurts
  11. Read the Sort node
  12. Prove it with memory
  13. Design the fix
  14. Apply it
  15. Watch the Sort node disappear
  16. Close the ticket