← All scenarios

Scenario · Storage & Backup

Disk full from temp files

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L2 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-05/01-disk-full-from-temp-files

Part of these paths

Show the postmortem & investigation hints spoilers
Disk full from temp files
Type: incident simulation · Topic: Storage & Backup · Level: L2 · Duration: 10–15 min
Launch: ride postgres start stage-05/01-disk-full-from-temp-files

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: a workload ran huge sorts that spilled to temporary files far beyond
work_mem, churning the temp area and threatening to fill the disk. temp_file_limit
capped each attempt, but the workload kept retrying, so temp_files/temp_bytes
climbed continuously. This is storage pressure, not a query-tuning task.

How it was found: pg_stat_database.temp_files/temp_bytes for the database kept
rising; pg_stat_activity showed one app (temp_spill) repeatedly spilling.

The mitigation: stop the runaway temp-spilling workload; temp creation then
stopped.

Lesson: trace temp-file pressure to the workload producing it and stop/fix that
(right work_mem per query, an index, or batching). Don't raise work_mem globally
— it multiplies per backend and makes disk pressure worse — and never delete
files by hand.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. Queries are failing/degrading from temporary-file pressure, not a plan problem. Look at temp usage: SELECT datname, temp_files, temp_bytes FROM pg_stat_database WHERE datname = current_database(); it keeps climbing.
2. Check the guardrails: SHOW temp_file_limit; SHOW work_mem; and find the culprit in pg_stat_activity — one app (temp_spill) is in a loop spilling huge sorts to disk.
3. Stop the runaway temp-spilling workload: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE application_name LIKE 'temp_spill%'; temp pressure then stops growing. Don't crank work_mem globally (every backend would grab that much) and don't add indexes.