B-tree Indexes and Selectivity
Build intuition for B-tree indexes by watching the planner choose between Seq Scan and Index Scan on a real million-row stand — equality lookups, range scans, low-cardinality traps, composite key order, and ORDER BY without a sort step.
Most people add an index whenever a column appears in a WHERE clause. This course replaces that reflex with a model: selectivity. We stand up a real PostgreSQL with customers and a million orders, then work through every situation where an index helps — and the ones where it quietly does nothing. We start with a plain equality lookup: no index, then with one, and we measure the difference. We move to range scans and watch the planner choose Bitmap Heap Scan for a narrow window but fall back to Seq Scan when the window covers most of the table. We build an index on the low-cardinality `status` column and confirm it is ignored for common values, then flip to a rare value and watch the same index spring to life. We construct a composite index and learn why equality columns must come before range columns, then break the rule on purpose and compare the costs. The leftmost-prefix rule comes next: a composite index on (A, B) does not help a query that filters only on B. We finish with ORDER BY … LIMIT — the most common production pattern — and show how a DESC index lets the planner take the first N leaves of the tree without sorting anything.
What you'll build
- Explain when a B-tree index scan beats a sequential scan and when it does not
- Define selectivity and use it to predict whether the planner will use an index
- Identify the low-cardinality trap and know which access patterns break it
- Build a composite index with the correct equality-before-range column order
- Apply the leftmost-prefix rule to predict which queries a composite index covers
- Create a DESC index to eliminate the Sort node from an ORDER BY LIMIT plan
Contents
- A database big enough to matter
- Wait for it to be ready
- One command to bring it up
- Start the server
- A short way into the database
- The customers table
- The orders table
- Create the tables
- Seed 200 000 customers, repeatably
- Seed one million orders
- Load the data
- Fresh statistics, single-threaded plans
- Confirm what we are sitting on
- A lookup with no index
- The whole table, row by row
- Build a B-tree on customer_id
- The same query, with a path through the index
- Index Scan and Bitmap Heap Scan
- What the startup cost tells you
- A range query on a near-unique column
- Seq Scan, again — no shortcut available
- Index the timestamp column
- The index lands on the right side of the range
- The same index, a different boundary
- Seq Scan — the planner ignores its own index
- The most common status value
- Seq Scan for a filter that touches most of the table
- Build an index on the low-cardinality column
- The index is ignored for the common value
- The same index, a rare value
- The same index, now useful
- A two-column filter — a typical production query
- One index used, the other applied as a filter
- Build the composite with equality first
- Both conditions applied inside the index
- The same query, wrong column order in the index
- Build the composite with range first
- Force the planner to use the reversed composite
- The reversed index reads 3 000 pages for 4 rows
- Restore the correct composite
- A query that skips the first column of the composite
- The composite is useless for the second column alone
- The latest 20 orders — a very common production pattern
- Sort the entire table to find the top 20
- An index that stores rows newest-first
- The same query after adding the DESC index
- Index Scan — no Sort node
- A plan-reading tool to keep
- Four questions that cover every plan in this course