← All bricks

PostgreSQL · Intermediate

B-tree Indexes and Selectivity

Build intuition for B-tree indexes by watching the planner choose between Seq Scan and Index Scan on a real million-row stand — equality lookups, range scans, low-cardinality traps, composite key order, and ORDER BY without a sort step.

Most people add an index whenever a column appears in a WHERE clause. This course replaces that reflex with a model: selectivity. We stand up a real PostgreSQL with customers and a million orders, then work through every situation where an index helps — and the ones where it quietly does nothing. We start with a plain equality lookup: no index, then with one, and we measure the difference. We move to range scans and watch the planner choose Bitmap Heap Scan for a narrow window but fall back to Seq Scan when the window covers most of the table. We build an index on the low-cardinality `status` column and confirm it is ignored for common values, then flip to a rare value and watch the same index spring to life. We construct a composite index and learn why equality columns must come before range columns, then break the rule on purpose and compare the costs. The leftmost-prefix rule comes next: a composite index on (A, B) does not help a query that filters only on B. We finish with ORDER BY … LIMIT — the most common production pattern — and show how a DESC index lets the planner take the first N leaves of the tree without sorting anything.

What you'll build

  • Explain when a B-tree index scan beats a sequential scan and when it does not
  • Define selectivity and use it to predict whether the planner will use an index
  • Identify the low-cardinality trap and know which access patterns break it
  • Build a composite index with the correct equality-before-range column order
  • Apply the leftmost-prefix rule to predict which queries a composite index covers
  • Create a DESC index to eliminate the Sort node from an ORDER BY LIMIT plan

Contents

  1. A database big enough to matter
  2. Wait for it to be ready
  3. One command to bring it up
  4. Start the server
  5. A short way into the database
  6. The customers table
  7. The orders table
  8. Create the tables
  9. Seed 200 000 customers, repeatably
  10. Seed one million orders
  11. Load the data
  12. Fresh statistics, single-threaded plans
  13. Confirm what we are sitting on
  14. A lookup with no index
  15. The whole table, row by row
  16. Build a B-tree on customer_id
  17. The same query, with a path through the index
  18. Index Scan and Bitmap Heap Scan
  19. What the startup cost tells you
  20. A range query on a near-unique column
  21. Seq Scan, again — no shortcut available
  22. Index the timestamp column
  23. The index lands on the right side of the range
  24. The same index, a different boundary
  25. Seq Scan — the planner ignores its own index
  26. The most common status value
  27. Seq Scan for a filter that touches most of the table
  28. Build an index on the low-cardinality column
  29. The index is ignored for the common value
  30. The same index, a rare value
  31. The same index, now useful
  32. A two-column filter — a typical production query
  33. One index used, the other applied as a filter
  34. Build the composite with equality first
  35. Both conditions applied inside the index
  36. The same query, wrong column order in the index
  37. Build the composite with range first
  38. Force the planner to use the reversed composite
  39. The reversed index reads 3 000 pages for 4 rows
  40. Restore the correct composite
  41. A query that skips the first column of the composite
  42. The composite is useless for the second column alone
  43. The latest 20 orders — a very common production pattern
  44. Sort the entire table to find the top 20
  45. An index that stores rows newest-first
  46. The same query after adding the DESC index
  47. Index Scan — no Sort node
  48. A plan-reading tool to keep
  49. Four questions that cover every plan in this course