Caching Strategies

Why Caching Matters

In distributed systems, latency is dominated by network hops and disk I/O. Caching eliminates redundant work by serving data from a faster layer closer to the caller.

Without cache	With cache
Every read hits the database	Hot data served from memory
Network round-trips add up	Fewer hops = lower latency
Database becomes the bottleneck	DB freed up for complex queries
Scaling means more DB replicas	Scaling out cache is cheaper

Where to Place Caches

Caching happens at every layer between the user and the database. The closer to the client, the faster — but the less fresh the data:

Client (Browser / Mobile App)
    ↓
[1] CDN                — static assets, edge-cached API responses
    ↓ miss
[2] Reverse Proxy      — Nginx / CloudFront full-page cache
    ↓ miss
[3] API Gateway        — response caching per route
    ↓ miss
[4] Application Layer  — in-process (IMemoryCache) or external (Redis / Memcached)
    ↓ miss
[5] Database           — buffer pool, query cache (see [Database Caching](/docs/database/caching))

Choosing the right layer

Cache at the highest layer that can serve the data with acceptable staleness. A CDN response is 10–50ms; a Redis lookup is 1–5ms; a DB query is 5–100ms+. Each layer down adds latency but gains freshness.

The Five Cache Strategies

Caching strategies split into two categories — how you read and how you write. They can be mixed and matched.

Read Strategies

Cache-Aside (Lazy Loading)

The application manages the cache explicitly. On a read, check cache first; on a miss, query the source and populate cache.

Read:  App → Cache? → [hit] → return
                  → [miss] → DB → store in cache → return
Write: App → DB → invalidate cache

Pros: Simple, full control, works with any cache store
Cons: Cache logic in application code, cold starts are slow
Best for: General-purpose, mixed read/write workloads

Read-Through

The cache sits between the application and the data source. On a miss, the cache provider automatically loads from the database. The application only talks to the cache.

Read:  App → Cache → [hit] → return
                  → [miss] → Cache loads from DB → store → return
Write: App → DB → invalidate cache

Pros: Cleaner application code (no cache logic), consistent cache behavior
Cons: Tightly coupled to cache provider, less control over load logic
Best for: Read-heavy workloads where you want clean separation of concerns
Examples: Spring Cache (@Cacheable), Hibernate second-level cache

Write Strategies

Write-Through

Every write goes to both the cache and the database synchronously before the client receives an ACK.

Write: App → Cache → DB → ACK
Read:  App → Cache → [hit, guaranteed fresh] → return

Pros: Strong consistency — cache and DB always in sync
Cons: Write latency is high (two synchronous writes)
Best for: Read-heavy workloads where consistency matters

Write-Around

Writes go directly to the database, bypassing the cache entirely. The cache is only populated on reads.

Write: App → DB → ACK (cache untouched)
Read:  App → Cache → [miss] → DB → store → return

Pros: Cache doesn't get polluted with write-only data, no write amplification
Cons: Recently written data isn't in cache until first read
Best for: Write-heavy workloads where data isn't re-read immediately (logs, analytics)

Write-Back (Write-Behind)

Writes go to the cache only; the client is ACKed immediately. The database is updated asynchronously in batches.

Write: App → Cache → ACK (fast!)
Background: Cache → batch flush → DB

Pros: Very fast writes, batch flush reduces DB load
Cons: Data loss risk if cache crashes before flush, eventual consistency
Best for: Write-heavy workloads that can tolerate data loss (telemetry, counters)

Combining Strategies

Read and write strategies are orthogonal — pick one from each:

Read Strategy	Write Strategy	Effect
Cache-Aside	Write-Around	Simple, cache populated only on reads
Cache-Aside	Write-Through	Consistent, app manages cache
Read-Through	Write-Through	Clean app code + strong consistency
Read-Through	Write-Behind	Clean app code + fast writes (eventual consistency)
Cache-Aside	Write-Behind	Fast writes, app controls reads

Distributed Cache Architecture

From Single Node to Cluster

A single Redis/Memcached instance is a single point of failure. Production systems use sharded clusters:

Client → Cache Proxy (Twemproxy / HAProxy / Redis Cluster)
              ├── Shard 1 (keys hash 0–33%)
              ├── Shard 2 (keys hash 34–66%)
              └── Shard 3 (keys hash 67–100%)

Sharding strategy: Consistent hashing minimizes key redistribution when nodes are added or removed. Each key is mapped to a point on a hash ring.

Replication for High Availability

Each shard can have replicas for failover:

Primary-replica: Writes go to primary; reads can be served from replicas
Sentinel (Redis): Automatic failover when primary goes down
Redis Cluster: Built-in sharding + replication (16384 hash slots)
Trade-off: More replicas = better read throughput + availability, but higher write replication cost

Cache Warm-up

Cold caches cause a burst of misses on deployment. Strategies to pre-populate:

Strategy	How it works
Pre-warm on deploy	Load known hot keys from DB into cache before routing traffic
Gradual rollout	Route a small percentage of traffic to new instances, increase gradually
Shadow traffic	Replay production reads against new cache to warm it passively

CDN Caching

What gets cached at the CDN?

Content type	Cacheability	Typical TTL
Static assets (JS, CSS, images)	High	Days to months (versioned URLs)
API responses (public data)	Medium	Seconds to minutes
API responses (user-specific)	Low	Usually not cached at CDN
HTML pages	Varies	Depends on dynamic content

Cache-Control Headers

Cache-Control: public, max-age=3600, s-maxage=86400

Directive	Meaning
`public`	Any cache (CDN, browser) can store the response
`private`	Only the browser can store (no CDN caching)
`max-age=3600`	Browser cache TTL — 1 hour
`s-maxage=86400`	CDN/shared cache TTL — 1 day (overrides max-age for CDNs)
`no-cache`	Must revalidate with server before using cached copy
`no-store`	Never cache this response

CDN Invalidation Strategies

CDNs don't know when your data changes. How to handle stale content:

URL versioning: /app.v1.2.3.js — change the URL when content changes. Old URLs cached indefinitely.
Surrogate keys: CDN associates keys with responses; purge by key when data changes.
Short TTL + stale-while-revalidate: Serve stale content immediately while fetching fresh in background.
API purge: Explicitly call the CDN's purge API on deploy (Cloudflare, Fastly, etc.).

# Stale-while-revalidate example
Cache-Control: public, max-age=60, stale-while-revalidate=300
# → Serve stale for up to 5 minutes while revalidating in background

Application-Level Patterns

Stale-While-Revalidate

Serve stale data immediately while fetching fresh data in the background. The next request gets fresh data.

Request → cache hit (but TTL expired)
        → return stale data to client immediately
        → background fetch from DB → update cache
Next request → cache hit (fresh data) → return

Users see a fast response (stale but acceptable)
Data converges to fresh within one request cycle
Supported by CDN headers (stale-while-revalidate), SWR (React), StaleWhileRevalidate (.NET)

Refresh-Ahead (Pre-expiration)

Before TTL expires, proactively refresh hot entries in the background. Users never see stale data for hot keys.

// Refresh at ~80% of TTL with jitter to avoid thundering herd
var ttl = TimeSpan.FromMinutes(30);
var refreshAt = TimeSpan.FromMinutes(24) + TimeSpan.FromSeconds(Random.Shared.Next(-30, 30));

Distributed Lock for Stampede Prevention

When a hot key expires, only one request should rebuild it — the rest wait or serve stale data:

async Task<string> GetWithLockAsync(string key, CancellationToken ct = default)
{
    var cached = await cache.GetStringAsync(key);
    if (cached is not null) return cached;

    var lockKey = $"lock:{key}";
    // Try to acquire a short-lived lock
    var acquired = await cache.SetStringAsync(
        lockKey, Environment.MachineName,
        new() { Expiration = TimeSpan.FromSeconds(10) });

    if (acquired)
    {
        try
        {
            var data = await db.QueryAsync(key, ct);
            await cache.SetStringAsync(key, data, new() { Expiration = ttl });
            return data;
        }
        finally
        {
            await cache.RemoveAsync(lockKey);
        }
    }

    // Lock not acquired — another instance is rebuilding. Retry after brief delay.
    await Task.Delay(100, ct);
    return await GetWithLockAsync(key, ct);
}

Common Pitfalls

Cache Stampede (Thundering Herd)

A popular key expires and thousands of concurrent requests all miss and hammer the database.

Mitigation: Distributed lock (above), probabilistic early expiration, or per-request coalescing.

Hot Keys

A single key receives disproportionate traffic. One cache shard becomes the bottleneck.

Mitigation: Replicate hot keys with random suffixes (user:1→a, user:1→b, user:1→c) and read from a random copy. Redis Cluster handles this with read replicas.

Large Values

Storing large JSON documents or blobs in cache wastes memory and increases serialization cost.

Mitigation: Cache only the fields you need, compress values, or use a separate object store for large data.

Cascading Failures

If the cache goes down, all traffic hits the database at once. The DB overloads and the entire system degrades.

Mitigation: Circuit breaker on cache reads — if cache is unavailable, return degraded responses or serve from a local fallback cache rather than overwhelming the DB.

Strategy Selection Guide

Scenario	Read Strategy	Write Strategy
Product catalog (read-heavy, changes rarely)	Cache-Aside or Read-Through	Write-Around
User profiles (read-heavy, occasional updates)	Read-Through	Write-Through
Social media feed (write-heavy, eventual consistency OK)	Cache-Aside	Write-Back
Shopping cart (strong consistency needed)	Read-Through	Write-Through
Analytics / telemetry (write-heavy, rarely re-read)	Cache-Aside	Write-Around
Session data (fast writes, TTL-based expiry)	Cache-Aside	Write-Back
Static assets	CDN (not application cache)	N/A

Database Caching — buffer pools, query cache, eviction policies in depth
Redis — the most popular distributed cache engine
Scalability — how caching fits into the bigger scaling picture
Load Balancing — distributing traffic across cache nodes

Why Caching Matters​

Where to Place Caches​

The Five Cache Strategies​

Read Strategies​

Cache-Aside (Lazy Loading)​

Read-Through​

Write Strategies​

Write-Through​

Write-Around​

Write-Back (Write-Behind)​

Combining Strategies​

Distributed Cache Architecture​

From Single Node to Cluster​

Replication for High Availability​

Cache Warm-up​

CDN Caching​

What gets cached at the CDN?​

Cache-Control Headers​

CDN Invalidation Strategies​

Application-Level Patterns​

Stale-While-Revalidate​

Refresh-Ahead (Pre-expiration)​

Distributed Lock for Stampede Prevention​

Common Pitfalls​

Cache Stampede (Thundering Herd)​

Hot Keys​

Large Values​

Cascading Failures​

Strategy Selection Guide​

Related Topics​