Scenario · Storage & Backup
Archive directory backlog
A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.
L3 · 10–15 min · runs locally in Docker
Launch
Start this scenario
Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.
ride postgres start stage-05/03-archive-directory-backlogPart of these paths
Show the postmortem & investigation hints spoilers
Archive directory backlog Type: incident simulation · Topic: Storage & Backup · Level: L3 · Duration: 10–15 min Launch: ride postgres start stage-05/03-archive-directory-backlog POSTMORTEM (root cause · how it was found · the fix · lesson) Root cause: archiving succeeded, but a write-heavy workload produced WAL segments faster than the archive destination was being offloaded, so the archive area grew without bound — a slow-burn risk to backups/PITR and disk, even though queries were fine. (Here the archive dir is a size-capped tmpfs so the sandbox can never fill the host disk.) How it was found: pg_stat_archiver.archived_count climbed continuously with no failures; the WAL turnover in pg_ls_waldir() and pg_stat_activity pointed at one workload driving it. The mitigation: stop the archive-filling workload; archive growth stopped. Lesson: monitor archive-destination size/growth, not just archiver success. The durable fix is retention/offload of archived WAL and rate-limiting bulk writes. Never "fix" it by turning archive_mode off — that silently destroys your PITR chain — and an index/checkpoint is irrelevant. INVESTIGATION HINTS (the staged path to diagnose and fix) 1. Archiving is working, but the archive destination keeps filling — a backup/PITR storage risk, not a query outage. Check the archiver: SELECT archived_count, last_archived_wal, last_failed_wal FROM pg_stat_archiver; archived_count climbs steadily, and pg_ls_waldir() shows constant WAL turnover. 2. A write-heavy workload (archive_writer) is forcing segment after segment to be archived into a bounded archive area that nobody is offloading. This is destination pressure, not a streaming-replication problem. 3. Stop the workload filling the archive: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE application_name LIKE 'archive_writer%'; the archive then stops growing. The real fix is retention/offload of old archives — do NOT turn archiving off (you'd lose PITR).