PostgreSQL · Intermediate

Stale Statistics After a Data-Distribution Change

A production incident, replayed by hand: after a bulk import the planner's row estimates went wrong several-fold, and the reflex fix — an index — would not have helped. Bring up the same database, catch the misestimate with EXPLAIN, confirm the statistics are stale, and fix it with one ANALYZE.

Query plans on the orders table went bad with no deploy and no schema change. A big historical import landed earlier, somebody already tried adding an index — and it changed nothing. This book replays the incident on a database you bring up yourself: the schema, the data, and the broken state are all in plain SQL files, so nothing is hidden. You reproduce the traffic, find the hot query with pg_stat_statements, and read the real clue in EXPLAIN (ANALYZE): the planner's row estimate disagrees with reality several-fold. pg_stat_user_tables confirms why — millions of rows changed since the last ANALYZE and autovacuum is off, so nothing ever refreshed the statistics. The fix is one ANALYZE, and you prove it by watching estimates line up with actuals. The diagnosis path is the real one you'd use on call.

What you'll build

Reproduce a planner misestimate on a local PostgreSQL stand
Find the hot query with pg_stat_statements
Read EXPLAIN (ANALYZE, BUFFERS): estimated rows vs actual rows
Check statistics freshness with pg_stat_user_tables
Fix stale statistics with ANALYZE — and resist the index reflex

The stand
Bring it up
The schema
Apply the schema
The data, in two eras
Load it
A clean slate for statistics
Watch the endpoint work
Ask the database what hurts
Ask the data itself
Ask the planner — and catch it lying
How old is the planner's worldview
Design the fix
Apply it
Estimates meet reality
Close the ticket

Stale Statistics After a Data-Distribution Change

What you'll build

Contents