Skip to main content

How Database Indexing Works

What is Indexing?​

πŸ“šWithout Index β€” Full Table Scan
P1
β†’
P2
β†’
P3
β†’
...
β†’
P6
β†’
P7
Read every page sequentially until you find what you need β€” like flipping through an entire book page by page.
7 pages checkedO(n)
πŸ“–With Index β€” Direct Lookup
INDEX
β†’
P7
Look up the term in the index at the back of the book, then jump directly to the right page.
1 page checkedO(log n)
🎯
What is an Index?
A separate data structure that stores a sorted copy of selected columns alongside pointers to the actual rows.
⚑
Why use Indexes?
Without an index, the database must scan every row. With one, it traverses a tree β€” reducing reads from billions to ~5.
βš™οΈ
How does it work?
A B-tree keeps data sorted. Each level narrows the search range by following the correct branch β€” like a decision tree.

Visualization​

Watch both approaches side by side for the same query β€” see how many rows the full table scan checks vs how few nodes the B-tree visits:

0 / 13
Without Index β€” Full Table Scan
SELECT * FROM users WHERE email = 'alice@mail.com'
#NameEmail
1Emmaemma@mail.com
2Liamliam@mail.com
3Sophiasophia@mail.com
4Noahnoah@mail.com
5Oliviaolivia@mail.com
6Jamesjames@mail.com
7Alicealice@mail.com
8Bobbob@mail.com
9Carolcarol@mail.com
10Davedave@mail.com
With B-tree Index β€” Tree Traversal
SELECT * FROM users WHERE email = 'alice@mail.com'
[ dave | sophia ]
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
[ bob ]
[ james ]
[ noah ]
β”Œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”
alice, bob
carol, dave
emma, james
Waiting for table scan to progress...

Types of Indexes​

🌳
B-tree
Default in most databases. Sorted structure supports range queries.
WHERE, ORDER BY, JOIN
#️⃣
Hash
O(1) exact match lookup using a hash table.
Equality only (=)
πŸ“Š
Bitmap
Bit arrays for low-cardinality columns.
Data warehousing
πŸ”—
Composite
Index on multiple columns β€” column order matters.
Multi-column WHERE
πŸ“¦
Covering
Includes all queried columns β€” avoids table lookups entirely.
Index-only scans
πŸ”‘
Unique
Enforces uniqueness on column values.
Primary keys

Index Scan Strategies​

The query optimizer chooses between these strategies based on cost estimates:

Index SeekFastest
B-tree traversal to find specific rows directly.
Index ScanFast
Read the entire index in order. Faster than table scan when index covers the query.
Full Table ScanSlowest
Read every row sequentially. Sometimes faster than index when selecting most rows.

Key Takeaways​

🐒
Without Index
Every row is checked sequentially β€” O(n) β€” painfully slow for large tables.
πŸš€
With B-tree Index
The tree narrows the search at each level β€” only a few nodes visited β€” O(log n).
βš–οΈ
Trade-off
Indexes speed up reads but slow down writes β€” every INSERT/UPDATE/DELETE must update the index too.
πŸ’Ύ
Storage Cost
Indexes consume additional disk space β€” typically 10-20% of table size.
🌲
B-tree Depth
For 1 billion rows with branching factor 100 β†’ only ~4-5 levels needed.

When to Index vs When NOT to​

βœ… When to Index
βœ“πŸ”Columns frequently used in WHERE clauses
βœ“πŸ”—Columns used in JOIN conditions
βœ“πŸ“ŠColumns used in ORDER BY or GROUP BY
βœ“πŸ“ˆHigh cardinality columns (many distinct values)
❌ When NOT to Index
βœ—πŸ“­Tables with few rows (scan is fast enough)
βœ—πŸ”„Low cardinality (e.g. boolean β€” index won't help)
βœ—βœοΈHeavy write workloads (each write updates every index)
βœ—πŸ”Frequently updated columns (maintenance cost is high)

How B-tree Indexes Work (Deep Dive)​

The most common index type is the B-tree (Balanced Tree). It keeps data sorted and allows searches, insertions, and deletions in O(log n) time:

  1. The root node contains keys that divide the data into ranges
  2. Each child node narrows the range further
  3. Leaf nodes contain the actual values and row pointers
  4. The tree stays balanced β€” all leaf nodes are at the same depth

For example, searching 1 billion rows with a branching factor of 100 requires only ~4-5 levels β€” that's 4-5 disk reads instead of 1 billion.

Learn More​

Read the full theory in Database Design > Indexing.