Skip to main content

How Database Replication Works

What is Replication?

Replication is the process of copying and maintaining database data across multiple servers (nodes). Instead of storing data on a single machine, replication distributes copies to several nodes so the system can survive failures and serve more users.

Why do we need it?

ProblemHow replication helps
Hardware failureIf one node crashes, another has a copy — the system stays up
Read scalabilityDistribute read queries across replicas instead of overloading one server
LatencyPlace replicas geographically close to users — faster response times
MaintenanceTake one node down for upgrades without downtime

Key terminology

  • Primary / Leader / Master — the node that accepts write operations
  • Replica / Follower / Slave — a node that receives copies of data from the primary
  • Write-ahead log (WAL) — the internal log the primary ships to replicas so they can replay changes
  • Replication lag — the time delay between a write on the primary and when the replica applies it
  • Failover — the process of promoting a replica to primary when the original primary fails
  • Write concern — how many nodes must confirm a write before the client gets an ACK (e.g., w: 1 = primary only, w: majority = most replicas)

How replication works (simplified)

1. Client sends WRITE to Primary
2. Primary writes to its local storage + WAL
3. Primary ships the WAL entry to Replicas
4. Replicas replay the WAL entry → data is now copied
5. Replicas send ACK back (timing depends on sync/async mode)
6. Client receives confirmation

The critical question is: when does the client get the ACK? That's what determines whether replication is synchronous or asynchronous.

Visualization

See how a single write flows through different replication topologies. Toggle between modes to compare write latency, consistency guarantees, and failure behavior:

Primary handles all writes, waits for all replicas to confirm before ACKing the client. Strong consistency but higher write latency.

0 / 0
💻Client
PrimaryWrites
Replica AReads
Replica BReads

Replication Strategies Compared

AspectSingle-Leader (Sync)Single-Leader (Async)Multi-Leader
Write pathClient → Primary → wait for all replicas → ACKClient → Primary → ACK immediatelyClient → any Leader → replicate to other Leaders
Read pathAny replica (always up-to-date)Any replica (may be stale)Any replica in any region
ConsistencyStrong — all nodes agree before ACKEventual — replicas catch up laterEventual — conflict resolution needed
Write latencyHigh (waits for all replicas)Low (ACK immediately)Low (local leader ACK)
Read scalabilityGood — replicas serve readsGood — replicas serve readsExcellent — reads spread across regions
Failure toleranceIf primary fails, a replica takes overReplicas may lose recent writesOther leaders continue accepting writes
Use caseFinancial systems, user accountsContent delivery, analytics dashboardsGeo-distributed apps, offline-first

When Replication Lag Matters

In asynchronous replication, there's a window where replicas are behind the primary. This causes:

  • Stale reads — a user writes a comment, refreshes, and doesn't see it because they hit a stale replica
  • Read-after-write inconsistency — the user who wrote the data expects to read it back immediately
  • Data loss on failover — if the primary crashes before replicating, committed writes are lost

Common mitigations:

  1. Read-your-writes consistency — route the writing user's subsequent reads to the primary
  2. Monotonic reads — ensure a user always reads from the same replica (session stickiness)
  3. Semi-synchronous replication — wait for at least one replica to confirm, then ACK

Single-Leader vs Multi-Leader

Single-Leader

The simplest model. One node accepts all writes; replicas only serve reads.

  • Easy to reason about — no write conflicts
  • Primary is a bottleneck for write-heavy workloads
  • Failover requires promoting a replica to primary (automatic in managed databases)

Multi-Leader

Multiple nodes accept writes independently. Changes are synchronized between leaders.

  • Writes scale horizontally — each region has its own leader
  • Requires conflict resolution when two leaders accept concurrent writes to the same row
  • Common conflict strategies: last-write-wins (LWW), application-level merge, CRDTs

Replication in Practice

  • PostgreSQL: Streaming replication (async by default, sync available). Uses WAL (Write-Ahead Log) shipping.
  • MySQL: Binary log replication with async, semi-sync, and group replication modes.
  • MongoDB: Replica sets with automatic failover. Supports adjustable write concern (w: 1, w: majority).
  • Redis: Asynchronous replication. Sentinel for automatic failover.

Learn More

Read the full theory in Database > Replication.