How Database Replication Works
What is Replication?
Replication is the process of copying and maintaining database data across multiple servers (nodes). Instead of storing data on a single machine, replication distributes copies to several nodes so the system can survive failures and serve more users.
Why do we need it?
| Problem | How replication helps |
|---|---|
| Hardware failure | If one node crashes, another has a copy — the system stays up |
| Read scalability | Distribute read queries across replicas instead of overloading one server |
| Latency | Place replicas geographically close to users — faster response times |
| Maintenance | Take one node down for upgrades without downtime |
Key terminology
- Primary / Leader / Master — the node that accepts write operations
- Replica / Follower / Slave — a node that receives copies of data from the primary
- Write-ahead log (WAL) — the internal log the primary ships to replicas so they can replay changes
- Replication lag — the time delay between a write on the primary and when the replica applies it
- Failover — the process of promoting a replica to primary when the original primary fails
- Write concern — how many nodes must confirm a write before the client gets an ACK (e.g.,
w: 1= primary only,w: majority= most replicas)
How replication works (simplified)
1. Client sends WRITE to Primary
2. Primary writes to its local storage + WAL
3. Primary ships the WAL entry to Replicas
4. Replicas replay the WAL entry → data is now copied
5. Replicas send ACK back (timing depends on sync/async mode)
6. Client receives confirmation
The critical question is: when does the client get the ACK? That's what determines whether replication is synchronous or asynchronous.
Visualization
See how a single write flows through different replication topologies. Toggle between modes to compare write latency, consistency guarantees, and failure behavior:
Primary handles all writes, waits for all replicas to confirm before ACKing the client. Strong consistency but higher write latency.
Replication Strategies Compared
| Aspect | Single-Leader (Sync) | Single-Leader (Async) | Multi-Leader |
|---|---|---|---|
| Write path | Client → Primary → wait for all replicas → ACK | Client → Primary → ACK immediately | Client → any Leader → replicate to other Leaders |
| Read path | Any replica (always up-to-date) | Any replica (may be stale) | Any replica in any region |
| Consistency | Strong — all nodes agree before ACK | Eventual — replicas catch up later | Eventual — conflict resolution needed |
| Write latency | High (waits for all replicas) | Low (ACK immediately) | Low (local leader ACK) |
| Read scalability | Good — replicas serve reads | Good — replicas serve reads | Excellent — reads spread across regions |
| Failure tolerance | If primary fails, a replica takes over | Replicas may lose recent writes | Other leaders continue accepting writes |
| Use case | Financial systems, user accounts | Content delivery, analytics dashboards | Geo-distributed apps, offline-first |
When Replication Lag Matters
In asynchronous replication, there's a window where replicas are behind the primary. This causes:
- Stale reads — a user writes a comment, refreshes, and doesn't see it because they hit a stale replica
- Read-after-write inconsistency — the user who wrote the data expects to read it back immediately
- Data loss on failover — if the primary crashes before replicating, committed writes are lost
Common mitigations:
- Read-your-writes consistency — route the writing user's subsequent reads to the primary
- Monotonic reads — ensure a user always reads from the same replica (session stickiness)
- Semi-synchronous replication — wait for at least one replica to confirm, then ACK
Single-Leader vs Multi-Leader
Single-Leader
The simplest model. One node accepts all writes; replicas only serve reads.
- Easy to reason about — no write conflicts
- Primary is a bottleneck for write-heavy workloads
- Failover requires promoting a replica to primary (automatic in managed databases)
Multi-Leader
Multiple nodes accept writes independently. Changes are synchronized between leaders.
- Writes scale horizontally — each region has its own leader
- Requires conflict resolution when two leaders accept concurrent writes to the same row
- Common conflict strategies: last-write-wins (LWW), application-level merge, CRDTs
Replication in Practice
- PostgreSQL: Streaming replication (async by default, sync available). Uses WAL (Write-Ahead Log) shipping.
- MySQL: Binary log replication with async, semi-sync, and group replication modes.
- MongoDB: Replica sets with automatic failover. Supports adjustable write concern (
w: 1,w: majority). - Redis: Asynchronous replication. Sentinel for automatic failover.
Learn More
Read the full theory in Database > Replication.