Scenario · Connections & Pooling
Connection storm after deploy
A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.
L2 · 10–15 min · runs locally in Docker
Launch
Start this scenario
Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.
ride postgres start stage-03/03-connection-storm-after-deployPart of these paths
Show the postmortem & investigation hints spoilers
Connection storm after deploy Type: incident simulation · Topic: Connections & Pooling · Level: L2 · Duration: 10–15 min Launch: ride postgres start stage-03/03-connection-storm-after-deploy POSTMORTEM (root cause · how it was found · the fix · lesson) Root cause: a deploy restarted the app fleet and every worker reconnected at the same moment — a thundering herd. The synchronized burst of connections pushed the database to max_connections, so new connections failed and latency spiked. Unlike a slow leak, this is bursty: triggered by the restart. How it was found: pg_stat_activity showed a sudden mass of one application (deploy_worker) near max_connections right after the deploy. The mitigation: shed the excess deploy_worker connections to recover. Lesson: a reconnect storm is a client-behavior problem. Fix it with a connection pool, jittered reconnects with exponential backoff, a smaller app pool size, and readiness gating on restart — not with indexes, and not by blindly raising max_connections. INVESTIGATION HINTS (the staged path to diagnose and fix) 1. This spiked right after a deploy: a wave of workers all reconnected at once. Group pg_stat_activity by application_name — a burst of one app (deploy_worker) is eating the slots. 2. Compare the count to SHOW max_connections. A thundering-herd reconnect after restart looks like 'too many clients' but is bursty, not a steady leak. 3. Shed the excess workers to recover now: pg_terminate_backend(pid) for that application_name where state = 'idle'. The real fix is pooling + jittered reconnect with backoff and a smaller app pool size.