b-trees-database-indexes.md
B-Trees: Why Every Database Index Looks the Same
The data structure behind PostgreSQL, SQLite, and MySQL indexes — balanced trees, page alignment, and why O(log n) isn't enough.
- Databases
- Algorithms
- Backend
Hash maps give O(1) lookup — why don't databases use them for indexes? Because disk I/O dominates, and B-trees minimize reads by matching storage page size.
B-tree basics
A B-tree is a self-balancing tree where:
- Every node holds many keys (not just one like BST)
- All leaves sit at the same depth
- Internal nodes route searches; leaves hold data or pointers
[ 30 | 60 ]
/ | \
[10|20] [40|50] [70|80|90]Each node sized to fit one disk page (typically 4–16 KB). One read = one node = thousands of keys.
Why not binary trees?
Binary trees are deep. A million rows → ~20 levels → ~20 random disk seeks. B-trees with 500 keys/node → ~3 levels → 3 seeks.
| Structure | Nodes for 1M rows | Typical disk reads |
|---|---|---|
| Binary tree | ~20 | ~20 |
| B-tree (500 fanout) | ~3 | ~3 |
Clustered vs secondary indexes
- Clustered — leaf nodes store row data (InnoDB PK)
- Secondary — leaf nodes store PK pointers → extra hop
That's why SELECT * through a secondary index costs more than covering index queries.
Write amplification
Inserts split nodes when full. Deletes merge or redistribute. B-trees trade write overhead for read predictability — the right bet for OLTP workloads.
When indexes hurt
- Low-cardinality columns (boolean gender flags)
- Write-heavy tables with many indexes
- Functions on indexed columns (
WHERE YEAR(created_at))
[!TIP]
EXPLAIN ANALYZEbefore adding indexes. Measure seeks, not assumptions.
Takeaway
B-trees aren't trendy — they're engineered for spinning disks and SSD page reads. Understanding fanout and page alignment explains more production query slowness than Big-O notation alone.
Related
Continue reading
More notes on similar topics.
How distributed systems agree on a log when nodes fail — terms, heartbeats, and why Paxos stayed in papers while Raft got textbooks.
- Distributed Systems
- Algorithms
How Ken Perlin's gradient noise creates infinite terrain, clouds, and fire — and why Simplex improved it thirty years later.
- Graphics
- Algorithms
A living reference for writing posts — thumbnails, code, tables, alerts, footnotes, images, audio, video, and YouTube embeds.
- Markdown
- Blog