← All scenarios

Scenario · Connections & Pooling

Idle connection leak

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L2 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-03/02-idle-connection-leak

Part of these paths

Show the postmortem & investigation hints spoilers
Idle connection leak
Type: incident simulation · Topic: Connections & Pooling · Level: L2 · Duration: 10–15 min
Launch: ride postgres start stage-03/02-idle-connection-leak

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: the application kept opening connections and never closing them.
The connections sat in state = 'idle' — not running queries, not in a
transaction, holding no locks — but each one consumed a connection slot, slowly
exhausting the pool.

How it was found: pg_stat_activity grouped by state/application_name showed many
'idle' (not 'idle in transaction') sessions for one app, parked since an old
state_change/backend_start. No locks, low CPU — so not a blocking transaction.

The mitigation: terminate the leaked idle connections.

Lesson: distinguish a connection *leak* (state = 'idle', slots wasted) from an
'idle in transaction' blocker (holds locks/snapshot). The real fix is closing
connections properly and using a pool with a maximum connection lifetime — not
indexes or ANALYZE.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. Connection slots are filling up, but there are no locks and CPU is low. Group pg_stat_activity by application_name/state — look for many sessions stuck in state = 'idle' (NOT 'idle in transaction').
2. Idle (not idle-in-transaction) means the app opened connections and never closed them — a leak. Check backend_start / state_change to see they've been parked a long time.
3. Terminate the leaked idle connections for that app: pg_terminate_backend(pid) where application_name = 'leaky_api' AND state = 'idle'. The real fix is closing connections / using a pool with a max lifetime.