← All scenarios

Scenario · Connections & Pooling

Runaway query holds a pool connection

A sandboxed PostgreSQL incident — investigate with your own tools, submit a fix, and get deterministic Detect / Fix / Trap scoring.

L3 · 10–15 min · runs locally in Docker

Launch

Start this scenario

Boot it in a real PostgreSQL sandbox and investigate with psql, EXPLAIN and pg_stat_statements.

ride postgres start stage-03/07-runaway-query-holds-pool-connection

Part of these paths

Show the postmortem & investigation hints spoilers
Runaway query holds a pool connection
Type: incident simulation · Topic: Connections & Pooling · Level: L3 · Duration: 10–15 min
Launch: ride postgres start stage-03/07-runaway-query-holds-pool-connection

POSTMORTEM (root cause · how it was found · the fix · lesson)
Root cause: a few long-running queries stayed ACTIVE for minutes, each pinning
one of PgBouncer's server connections. With the pool's slots all occupied, every
other client queued (cl_waiting > 0) and the app appeared to hang — even though
PostgreSQL itself had plenty of connection slots free. This is a pooling
incident, not a lock and not a missing index.

How it was found: pg_stat_activity showed Postgres far from max_connections but
a handful of queries active far longer than any healthy request; PgBouncer's
SHOW POOLS showed all server connections busy with clients waiting.

The mitigation: terminate the runaway queries holding the pool's server
connections; the queued client is promoted and cl_waiting drops to 0.

Lesson: a runaway query on a pooled path starves everyone behind it. Cap it with
statement_timeout (and idle_in_transaction_session_timeout), size the pool for
the workload, and keep pooled transactions short — don't reach for indexes or
for raising Postgres max_connections.

INVESTIGATION HINTS (the staged path to diagnose and fix)
1. The app hangs, but PostgreSQL is NOT at max_connections — there's headroom in pg_stat_activity. The pool one layer up (PgBouncer) is starved. Look for long-running ACTIVE queries holding the pool's server connections.
2. On the direct Postgres connection: SELECT pid, application_name, state, now() - query_start AS age, query FROM pg_stat_activity WHERE state = 'active' ORDER BY query_start; A couple of queries have been active far too long. On the PgBouncer admin console, SHOW POOLS shows cl_waiting > 0 with every server slot busy.
3. Terminate the long-running queries that are pinning the pool's server connections (pg_terminate_backend for the runaway_query backends) — not the waiters. SHOW POOLS cl_waiting then drops to 0. The durable fix is statement_timeout and not running unbounded queries on a pooled path.