PostgreSQL

Investigating with database tools

You investigate a rillence postgres incident exactly like you would in production. Each scenario boots a real PostgreSQL in Docker and prints a connection string. Open a second terminal and connect with whatever PostgreSQL client you prefer — psql, an IDE, DataGrip, DBeaver, or your own scripts — then start digging. For example, with psql:

psql postgres://postgres:postgres@127.0.0.1:55432/app_db

Nothing is mocked. The catalog, the planner, the statistics, and the workload are all real, so every query you would run against a live database works here too. Query freely — you cannot break the scoring by exploring.

Starting points

Find the slowest statements:

SELECT query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Understand a specific plan, with real timing and I/O:

EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

Inspect live sessions and untangle locks:

SELECT pid, state, wait_event_type, query
FROM pg_stat_activity
WHERE state <> 'idle';

SELECT pid, pg_blocking_pids(pid) AS blocked_by
FROM pg_stat_activity
WHERE cardinality(pg_blocking_pids(pid)) > 0;

Spot bloat and stale statistics:

SELECT relname, n_live_tup, n_dead_tup, last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Use your own tools

Because it is a normal PostgreSQL instance, your everyday tooling works unchanged — psql, an IDE, DataGrip, DBeaver, or whatever client you prefer. Connect, set up your usual \timing, \x, and saved queries, and treat the scenario like any other incident on call.

When you have found and applied a fix, run score in the REPL to check your work.