Skip to main content

Caching Strategies

Why Caching Matters

In distributed systems, latency is dominated by network hops and disk I/O. Caching eliminates redundant work by serving data from a faster layer closer to the caller.

Without cacheWith cache
Every read hits the databaseHot data served from memory
Network round-trips add upFewer hops = lower latency
Database becomes the bottleneckDB freed up for complex queries
Scaling means more DB replicasScaling out cache is cheaper

Where to Place Caches

Caching happens at every layer between the user and the database. The closer to the client, the faster — but the less fresh the data:

Client (Browser / Mobile App)

[1] CDN — static assets, edge-cached API responses
↓ miss
[2] Reverse Proxy — Nginx / CloudFront full-page cache
↓ miss
[3] API Gateway — response caching per route
↓ miss
[4] Application Layer — in-process (IMemoryCache) or external (Redis / Memcached)
↓ miss
[5] Database — buffer pool, query cache (see [Database Caching](/docs/database/caching))
Choosing the right layer

Cache at the highest layer that can serve the data with acceptable staleness. A CDN response is 10–50ms; a Redis lookup is 1–5ms; a DB query is 5–100ms+. Each layer down adds latency but gains freshness.

The Five Cache Strategies

Caching strategies split into two categories — how you read and how you write. They can be mixed and matched.

Read Strategies

Cache-Aside (Lazy Loading)

The application manages the cache explicitly. On a read, check cache first; on a miss, query the source and populate cache.

Read:  App → Cache? → [hit] → return
→ [miss] → DB → store in cache → return
Write: App → DB → invalidate cache
  • Pros: Simple, full control, works with any cache store
  • Cons: Cache logic in application code, cold starts are slow
  • Best for: General-purpose, mixed read/write workloads

Read-Through

The cache sits between the application and the data source. On a miss, the cache provider automatically loads from the database. The application only talks to the cache.

Read:  App → Cache → [hit] → return
→ [miss] → Cache loads from DB → store → return
Write: App → DB → invalidate cache
  • Pros: Cleaner application code (no cache logic), consistent cache behavior
  • Cons: Tightly coupled to cache provider, less control over load logic
  • Best for: Read-heavy workloads where you want clean separation of concerns
  • Examples: Spring Cache (@Cacheable), Hibernate second-level cache

Write Strategies

Write-Through

Every write goes to both the cache and the database synchronously before the client receives an ACK.

Write: App → Cache → DB → ACK
Read: App → Cache → [hit, guaranteed fresh] → return
  • Pros: Strong consistency — cache and DB always in sync
  • Cons: Write latency is high (two synchronous writes)
  • Best for: Read-heavy workloads where consistency matters

Write-Around

Writes go directly to the database, bypassing the cache entirely. The cache is only populated on reads.

Write: App → DB → ACK (cache untouched)
Read: App → Cache → [miss] → DB → store → return
  • Pros: Cache doesn't get polluted with write-only data, no write amplification
  • Cons: Recently written data isn't in cache until first read
  • Best for: Write-heavy workloads where data isn't re-read immediately (logs, analytics)

Write-Back (Write-Behind)

Writes go to the cache only; the client is ACKed immediately. The database is updated asynchronously in batches.

Write: App → Cache → ACK (fast!)
Background: Cache → batch flush → DB
  • Pros: Very fast writes, batch flush reduces DB load
  • Cons: Data loss risk if cache crashes before flush, eventual consistency
  • Best for: Write-heavy workloads that can tolerate data loss (telemetry, counters)

Combining Strategies

Read and write strategies are orthogonal — pick one from each:

Read StrategyWrite StrategyEffect
Cache-AsideWrite-AroundSimple, cache populated only on reads
Cache-AsideWrite-ThroughConsistent, app manages cache
Read-ThroughWrite-ThroughClean app code + strong consistency
Read-ThroughWrite-BehindClean app code + fast writes (eventual consistency)
Cache-AsideWrite-BehindFast writes, app controls reads

Distributed Cache Architecture

From Single Node to Cluster

A single Redis/Memcached instance is a single point of failure. Production systems use sharded clusters:

Client → Cache Proxy (Twemproxy / HAProxy / Redis Cluster)
├── Shard 1 (keys hash 0–33%)
├── Shard 2 (keys hash 34–66%)
└── Shard 3 (keys hash 67–100%)

Sharding strategy: Consistent hashing minimizes key redistribution when nodes are added or removed. Each key is mapped to a point on a hash ring.

Replication for High Availability

Each shard can have replicas for failover:

  • Primary-replica: Writes go to primary; reads can be served from replicas
  • Sentinel (Redis): Automatic failover when primary goes down
  • Redis Cluster: Built-in sharding + replication (16384 hash slots)
  • Trade-off: More replicas = better read throughput + availability, but higher write replication cost

Cache Warm-up

Cold caches cause a burst of misses on deployment. Strategies to pre-populate:

StrategyHow it works
Pre-warm on deployLoad known hot keys from DB into cache before routing traffic
Gradual rolloutRoute a small percentage of traffic to new instances, increase gradually
Shadow trafficReplay production reads against new cache to warm it passively

CDN Caching

What gets cached at the CDN?

Content typeCacheabilityTypical TTL
Static assets (JS, CSS, images)HighDays to months (versioned URLs)
API responses (public data)MediumSeconds to minutes
API responses (user-specific)LowUsually not cached at CDN
HTML pagesVariesDepends on dynamic content

Cache-Control Headers

Cache-Control: public, max-age=3600, s-maxage=86400
DirectiveMeaning
publicAny cache (CDN, browser) can store the response
privateOnly the browser can store (no CDN caching)
max-age=3600Browser cache TTL — 1 hour
s-maxage=86400CDN/shared cache TTL — 1 day (overrides max-age for CDNs)
no-cacheMust revalidate with server before using cached copy
no-storeNever cache this response

CDN Invalidation Strategies

CDNs don't know when your data changes. How to handle stale content:

  1. URL versioning: /app.v1.2.3.js — change the URL when content changes. Old URLs cached indefinitely.
  2. Surrogate keys: CDN associates keys with responses; purge by key when data changes.
  3. Short TTL + stale-while-revalidate: Serve stale content immediately while fetching fresh in background.
  4. API purge: Explicitly call the CDN's purge API on deploy (Cloudflare, Fastly, etc.).
# Stale-while-revalidate example
Cache-Control: public, max-age=60, stale-while-revalidate=300
# → Serve stale for up to 5 minutes while revalidating in background

Application-Level Patterns

Stale-While-Revalidate

Serve stale data immediately while fetching fresh data in the background. The next request gets fresh data.

Request → cache hit (but TTL expired)
→ return stale data to client immediately
→ background fetch from DB → update cache
Next request → cache hit (fresh data) → return
  • Users see a fast response (stale but acceptable)
  • Data converges to fresh within one request cycle
  • Supported by CDN headers (stale-while-revalidate), SWR (React), StaleWhileRevalidate (.NET)

Refresh-Ahead (Pre-expiration)

Before TTL expires, proactively refresh hot entries in the background. Users never see stale data for hot keys.

// Refresh at ~80% of TTL with jitter to avoid thundering herd
var ttl = TimeSpan.FromMinutes(30);
var refreshAt = TimeSpan.FromMinutes(24) + TimeSpan.FromSeconds(Random.Shared.Next(-30, 30));

Distributed Lock for Stampede Prevention

When a hot key expires, only one request should rebuild it — the rest wait or serve stale data:

async Task<string> GetWithLockAsync(string key, CancellationToken ct = default)
{
var cached = await cache.GetStringAsync(key);
if (cached is not null) return cached;

var lockKey = $"lock:{key}";
// Try to acquire a short-lived lock
var acquired = await cache.SetStringAsync(
lockKey, Environment.MachineName,
new() { Expiration = TimeSpan.FromSeconds(10) });

if (acquired)
{
try
{
var data = await db.QueryAsync(key, ct);
await cache.SetStringAsync(key, data, new() { Expiration = ttl });
return data;
}
finally
{
await cache.RemoveAsync(lockKey);
}
}

// Lock not acquired — another instance is rebuilding. Retry after brief delay.
await Task.Delay(100, ct);
return await GetWithLockAsync(key, ct);
}

Common Pitfalls

Cache Stampede (Thundering Herd)

A popular key expires and thousands of concurrent requests all miss and hammer the database.

Mitigation: Distributed lock (above), probabilistic early expiration, or per-request coalescing.

Hot Keys

A single key receives disproportionate traffic. One cache shard becomes the bottleneck.

Mitigation: Replicate hot keys with random suffixes (user:1→a, user:1→b, user:1→c) and read from a random copy. Redis Cluster handles this with read replicas.

Large Values

Storing large JSON documents or blobs in cache wastes memory and increases serialization cost.

Mitigation: Cache only the fields you need, compress values, or use a separate object store for large data.

Cascading Failures

If the cache goes down, all traffic hits the database at once. The DB overloads and the entire system degrades.

Mitigation: Circuit breaker on cache reads — if cache is unavailable, return degraded responses or serve from a local fallback cache rather than overwhelming the DB.

Strategy Selection Guide

ScenarioRead StrategyWrite Strategy
Product catalog (read-heavy, changes rarely)Cache-Aside or Read-ThroughWrite-Around
User profiles (read-heavy, occasional updates)Read-ThroughWrite-Through
Social media feed (write-heavy, eventual consistency OK)Cache-AsideWrite-Back
Shopping cart (strong consistency needed)Read-ThroughWrite-Through
Analytics / telemetry (write-heavy, rarely re-read)Cache-AsideWrite-Around
Session data (fast writes, TTL-based expiry)Cache-AsideWrite-Back
Static assetsCDN (not application cache)N/A
  • Database Caching — buffer pools, query cache, eviction policies in depth
  • Redis — the most popular distributed cache engine
  • Scalability — how caching fits into the bigger scaling picture
  • Load Balancing — distributing traffic across cache nodes