Caching Strategies
Why Caching Matters
In distributed systems, latency is dominated by network hops and disk I/O. Caching eliminates redundant work by serving data from a faster layer closer to the caller.
| Without cache | With cache |
|---|---|
| Every read hits the database | Hot data served from memory |
| Network round-trips add up | Fewer hops = lower latency |
| Database becomes the bottleneck | DB freed up for complex queries |
| Scaling means more DB replicas | Scaling out cache is cheaper |
Where to Place Caches
Caching happens at every layer between the user and the database. The closer to the client, the faster — but the less fresh the data:
Client (Browser / Mobile App)
↓
[1] CDN — static assets, edge-cached API responses
↓ miss
[2] Reverse Proxy — Nginx / CloudFront full-page cache
↓ miss
[3] API Gateway — response caching per route
↓ miss
[4] Application Layer — in-process (IMemoryCache) or external (Redis / Memcached)
↓ miss
[5] Database — buffer pool, query cache (see [Database Caching](/docs/database/caching))
Cache at the highest layer that can serve the data with acceptable staleness. A CDN response is 10–50ms; a Redis lookup is 1–5ms; a DB query is 5–100ms+. Each layer down adds latency but gains freshness.
The Five Cache Strategies
Caching strategies split into two categories — how you read and how you write. They can be mixed and matched.
Read Strategies
Cache-Aside (Lazy Loading)
The application manages the cache explicitly. On a read, check cache first; on a miss, query the source and populate cache.
Read: App → Cache? → [hit] → return
→ [miss] → DB → store in cache → return
Write: App → DB → invalidate cache
- Pros: Simple, full control, works with any cache store
- Cons: Cache logic in application code, cold starts are slow
- Best for: General-purpose, mixed read/write workloads
Read-Through
The cache sits between the application and the data source. On a miss, the cache provider automatically loads from the database. The application only talks to the cache.
Read: App → Cache → [hit] → return
→ [miss] → Cache loads from DB → store → return
Write: App → DB → invalidate cache
- Pros: Cleaner application code (no cache logic), consistent cache behavior
- Cons: Tightly coupled to cache provider, less control over load logic
- Best for: Read-heavy workloads where you want clean separation of concerns
- Examples: Spring Cache (
@Cacheable), Hibernate second-level cache
Write Strategies
Write-Through
Every write goes to both the cache and the database synchronously before the client receives an ACK.
Write: App → Cache → DB → ACK
Read: App → Cache → [hit, guaranteed fresh] → return
- Pros: Strong consistency — cache and DB always in sync
- Cons: Write latency is high (two synchronous writes)
- Best for: Read-heavy workloads where consistency matters
Write-Around
Writes go directly to the database, bypassing the cache entirely. The cache is only populated on reads.
Write: App → DB → ACK (cache untouched)
Read: App → Cache → [miss] → DB → store → return
- Pros: Cache doesn't get polluted with write-only data, no write amplification
- Cons: Recently written data isn't in cache until first read
- Best for: Write-heavy workloads where data isn't re-read immediately (logs, analytics)
Write-Back (Write-Behind)
Writes go to the cache only; the client is ACKed immediately. The database is updated asynchronously in batches.
Write: App → Cache → ACK (fast!)
Background: Cache → batch flush → DB
- Pros: Very fast writes, batch flush reduces DB load
- Cons: Data loss risk if cache crashes before flush, eventual consistency
- Best for: Write-heavy workloads that can tolerate data loss (telemetry, counters)
Combining Strategies
Read and write strategies are orthogonal — pick one from each:
| Read Strategy | Write Strategy | Effect |
|---|---|---|
| Cache-Aside | Write-Around | Simple, cache populated only on reads |
| Cache-Aside | Write-Through | Consistent, app manages cache |
| Read-Through | Write-Through | Clean app code + strong consistency |
| Read-Through | Write-Behind | Clean app code + fast writes (eventual consistency) |
| Cache-Aside | Write-Behind | Fast writes, app controls reads |
Distributed Cache Architecture
From Single Node to Cluster
A single Redis/Memcached instance is a single point of failure. Production systems use sharded clusters:
Client → Cache Proxy (Twemproxy / HAProxy / Redis Cluster)
├── Shard 1 (keys hash 0–33%)
├── Shard 2 (keys hash 34–66%)
└── Shard 3 (keys hash 67–100%)
Sharding strategy: Consistent hashing minimizes key redistribution when nodes are added or removed. Each key is mapped to a point on a hash ring.
Replication for High Availability
Each shard can have replicas for failover:
- Primary-replica: Writes go to primary; reads can be served from replicas
- Sentinel (Redis): Automatic failover when primary goes down
- Redis Cluster: Built-in sharding + replication (16384 hash slots)
- Trade-off: More replicas = better read throughput + availability, but higher write replication cost
Cache Warm-up
Cold caches cause a burst of misses on deployment. Strategies to pre-populate:
| Strategy | How it works |
|---|---|
| Pre-warm on deploy | Load known hot keys from DB into cache before routing traffic |
| Gradual rollout | Route a small percentage of traffic to new instances, increase gradually |
| Shadow traffic | Replay production reads against new cache to warm it passively |
CDN Caching
What gets cached at the CDN?
| Content type | Cacheability | Typical TTL |
|---|---|---|
| Static assets (JS, CSS, images) | High | Days to months (versioned URLs) |
| API responses (public data) | Medium | Seconds to minutes |
| API responses (user-specific) | Low | Usually not cached at CDN |
| HTML pages | Varies | Depends on dynamic content |
Cache-Control Headers
Cache-Control: public, max-age=3600, s-maxage=86400
| Directive | Meaning |
|---|---|
public | Any cache (CDN, browser) can store the response |
private | Only the browser can store (no CDN caching) |
max-age=3600 | Browser cache TTL — 1 hour |
s-maxage=86400 | CDN/shared cache TTL — 1 day (overrides max-age for CDNs) |
no-cache | Must revalidate with server before using cached copy |
no-store | Never cache this response |
CDN Invalidation Strategies
CDNs don't know when your data changes. How to handle stale content:
- URL versioning:
/app.v1.2.3.js— change the URL when content changes. Old URLs cached indefinitely. - Surrogate keys: CDN associates keys with responses; purge by key when data changes.
- Short TTL + stale-while-revalidate: Serve stale content immediately while fetching fresh in background.
- API purge: Explicitly call the CDN's purge API on deploy (Cloudflare, Fastly, etc.).
# Stale-while-revalidate example
Cache-Control: public, max-age=60, stale-while-revalidate=300
# → Serve stale for up to 5 minutes while revalidating in background
Application-Level Patterns
Stale-While-Revalidate
Serve stale data immediately while fetching fresh data in the background. The next request gets fresh data.
Request → cache hit (but TTL expired)
→ return stale data to client immediately
→ background fetch from DB → update cache
Next request → cache hit (fresh data) → return
- Users see a fast response (stale but acceptable)
- Data converges to fresh within one request cycle
- Supported by CDN headers (
stale-while-revalidate), SWR (React), StaleWhileRevalidate (.NET)
Refresh-Ahead (Pre-expiration)
Before TTL expires, proactively refresh hot entries in the background. Users never see stale data for hot keys.
// Refresh at ~80% of TTL with jitter to avoid thundering herd
var ttl = TimeSpan.FromMinutes(30);
var refreshAt = TimeSpan.FromMinutes(24) + TimeSpan.FromSeconds(Random.Shared.Next(-30, 30));
Distributed Lock for Stampede Prevention
When a hot key expires, only one request should rebuild it — the rest wait or serve stale data:
async Task<string> GetWithLockAsync(string key, CancellationToken ct = default)
{
var cached = await cache.GetStringAsync(key);
if (cached is not null) return cached;
var lockKey = $"lock:{key}";
// Try to acquire a short-lived lock
var acquired = await cache.SetStringAsync(
lockKey, Environment.MachineName,
new() { Expiration = TimeSpan.FromSeconds(10) });
if (acquired)
{
try
{
var data = await db.QueryAsync(key, ct);
await cache.SetStringAsync(key, data, new() { Expiration = ttl });
return data;
}
finally
{
await cache.RemoveAsync(lockKey);
}
}
// Lock not acquired — another instance is rebuilding. Retry after brief delay.
await Task.Delay(100, ct);
return await GetWithLockAsync(key, ct);
}
Common Pitfalls
Cache Stampede (Thundering Herd)
A popular key expires and thousands of concurrent requests all miss and hammer the database.
Mitigation: Distributed lock (above), probabilistic early expiration, or per-request coalescing.
Hot Keys
A single key receives disproportionate traffic. One cache shard becomes the bottleneck.
Mitigation: Replicate hot keys with random suffixes (user:1→a, user:1→b, user:1→c) and read from a random copy. Redis Cluster handles this with read replicas.
Large Values
Storing large JSON documents or blobs in cache wastes memory and increases serialization cost.
Mitigation: Cache only the fields you need, compress values, or use a separate object store for large data.
Cascading Failures
If the cache goes down, all traffic hits the database at once. The DB overloads and the entire system degrades.
Mitigation: Circuit breaker on cache reads — if cache is unavailable, return degraded responses or serve from a local fallback cache rather than overwhelming the DB.
Strategy Selection Guide
| Scenario | Read Strategy | Write Strategy |
|---|---|---|
| Product catalog (read-heavy, changes rarely) | Cache-Aside or Read-Through | Write-Around |
| User profiles (read-heavy, occasional updates) | Read-Through | Write-Through |
| Social media feed (write-heavy, eventual consistency OK) | Cache-Aside | Write-Back |
| Shopping cart (strong consistency needed) | Read-Through | Write-Through |
| Analytics / telemetry (write-heavy, rarely re-read) | Cache-Aside | Write-Around |
| Session data (fast writes, TTL-based expiry) | Cache-Aside | Write-Back |
| Static assets | CDN (not application cache) | N/A |
Related Topics
- Database Caching — buffer pools, query cache, eviction policies in depth
- Redis — the most popular distributed cache engine
- Scalability — how caching fits into the bigger scaling picture
- Load Balancing — distributing traffic across cache nodes