Caching is the single highest-leverage performance tool available — and also one of the most common sources of production bugs. The decision isn’t just “should we cache?” — it’s where, how, and what the consistency implications are.
Cache Placement: Where Does the Cache Live? #
Each layer has different latency, scope, and invalidation complexity.
Client-Side Cache #
Browser cache, mobile app cache. Controlled by HTTP Cache-Control headers. The cheapest possible cache — zero server load. Appropriate for truly static content (JS bundles, images, CSS). Not appropriate for user-specific or frequently changing data without careful ETag/Last-Modified handling.
CDN Cache #
Globally distributed edge nodes (Cloudflare, CloudFront, Fastly). Serves static assets and cacheable responses from a location close to the user. CDN caching can absorb enormous traffic spikes — a viral article getting 10M requests hits the CDN, not your origin.
Key decision: What can you put on the CDN? Anything that’s the same for all users (or can be personalized at the edge via cookies/JWT) and doesn’t change too frequently. Product pages, landing pages, API responses with Cache-Control: public, max-age=300.
API Gateway / Reverse Proxy Cache #
NGINX or API Gateway caches responses. Useful when a large percentage of requests ask for the same thing (public API endpoints, rate-limited reads). Shared across all backend instances.
Application-Level Cache #
Your service’s in-memory cache or a shared Redis instance. This is where most teams focus — it’s flexible and gives the most control.
Local (in-process) cache: Java ConcurrentHashMap, Caffeine, Guava Cache. Sub-microsecond reads, but not shared across service instances. If you have 10 pods, each has its own copy — inefficient for large datasets. Also invalidation is tricky — you need to handle cache coherence across instances.
Distributed cache (Redis, Memcached): Shared across all service instances. A cache miss or invalidation from any instance affects all. Higher latency than local cache (~1ms vs nanoseconds) but consistent view across instances.
Multi-level caching: Local L1 + Redis L2. Cache popular items in-process, fall back to Redis, fall back to DB. Complex to invalidate correctly — usually only worth it for extremely hot data.
Database Query Cache #
Postgres has no query result cache (it was removed — too many correctness problems). MySQL has a query cache too (also removed in 8.0). Most “DB caching” happens in the DB’s buffer pool — keep frequently accessed data in memory via proper sizing.
Redis vs Memcached #
This is mostly settled: use Redis unless you have a specific reason not to.
Memcached is marginally faster at pure LRU string cache operations at extreme scale, and it’s truly multi-threaded (useful for multi-core cache machines). But:
- Redis supports strings, hashes, lists, sets, sorted sets, streams, HyperLogLog, geo-indexes
- Redis has persistence options (RDB + AOF) — cache survives restarts with warm data
- Redis Cluster for horizontal scaling
- Redis has Lua scripting for atomic multi-step operations
- Redis 6+ is multi-threaded for network I/O
When Memcached still makes sense: You’re in a pure LRU string-cache scenario at extreme scale and have existing Memcached expertise and tooling. Almost no new systems should choose Memcached today.
Cache Patterns: Cache-Aside, Write-Through, Write-Behind #
Cache-Aside (Lazy Loading) #
The most common pattern. Application code manages the cache explicitly.
READ:
1. Check cache → hit? return.
2. Miss → query DB → store in cache → return.
WRITE:
1. Write to DB.
2. Invalidate (or update) cache entry.Advantages: Only caches data that’s actually read. Resilient to cache failures (fall through to DB). Easy to implement.
Disadvantages: Cache miss causes noticeable latency (cache fill under load). Initial cold start hits DB hard. Race condition on write: two reads can both miss, both query DB, one stores stale data.
When to use: Read-heavy workloads where occasional cache misses are acceptable. Most caching scenarios.
Write-Through #
Every write goes to both cache and DB simultaneously. Reads are always warm (if the data was ever written).
WRITE:
1. Write to DB and cache atomically.
READ:
1. Always hit cache (for recently written data).Advantages: Cache always has fresh data for recently written records. No cache miss on first read.
Disadvantages: Write latency includes cache write. Caches data that may never be read (infrequently accessed writes still fill the cache). Cache storage must be large enough to hold write-through data.
When to use: Systems where write latency is acceptable and read-after-write consistency matters (user profile updates, settings changes).
Write-Behind (Write-Back) #
Writes go to cache first, DB is updated asynchronously.
WRITE:
1. Write to cache → return success to caller.
2. Async: flush to DB (batched or periodic).
READ:
1. Read from cache.Advantages: Write latency is minimized (cache write is fast). Can batch writes to DB for efficiency.
Disadvantages: Risk of data loss if cache fails before flush. Complex failure handling. Reads might see data not yet in DB. Strong consistency guarantees are hard.
When to use: High write throughput scenarios where some data loss is acceptable (analytics counters, activity tracking, view counts). Almost never for financial or critical transactional data.
Cache Invalidation: The Hard Problem #
“There are only two hard things in computer science: cache invalidation and naming things.” The reason it’s hard: distributed systems don’t provide atomicity across a database write and a cache invalidation.
Pattern 1: TTL-based expiry Every cache entry has a time-to-live. After expiry, the next read misses and refills from DB. Simple, safe, but means serving stale data up to TTL seconds.
Right call: Most data is OK to be stale by a few seconds or minutes. Use TTL as your default strategy and reserve event-based invalidation for data where staleness is genuinely harmful.
Pattern 2: Event-driven invalidation On write, publish an event (via Kafka, Redis pub/sub, database trigger) that invalidates the cache entry. Near-real-time freshness.
Risk: Race condition — read → cache miss → DB read → publish event → cache write → invalidation arrives → entry deleted. The refilled entry is immediately invalidated. Under high concurrency this can cause cache thrashing.
Pattern 3: Cache-aside with versioned keys Instead of invalidating, change the cache key (include a version or timestamp). Old entries naturally expire via TTL. Eliminates invalidation races at the cost of more cache memory.
Pattern 4: Read-through with write invalidation Systematic invalidation tied to the write path. Works when writes are serialized through a single service that owns both the data and its cache.
When Caching Makes Things Worse #
- Low hit rate: If your hit rate is < 80–90%, the overhead of cache lookups + misses may exceed the DB savings. Profile before assuming caching helps.
- Wrong granularity: Caching entire user objects when you only need the name field. Cache bloat → more evictions → lower hit rate.
- Cache stampede: All TTLs expire simultaneously at scale. Every request misses and floods the DB. Solution: randomize TTL (+/- 10–20% of base TTL), or use probabilistic early expiration (refresh when a small fraction of requests notice TTL is close to expiry).
- Memory pressure causes evictions: Cache is too small, eviction policy kicks in for hot data. Monitor eviction rate — it should be near zero for important data.
- Caching mutable data without invalidation: The bug where a user changes their email but the cache serves the old email for 24 hours.
- Caching at the wrong layer: Adding application cache when the DB query is just missing an index. Fix the root cause.
The 95% Hit Rate Question #
“Your cache hit rate is 95% but latency is still bad — what do you investigate?”
A 95% hit rate sounds good, but at 1000 req/s that’s still 50 misses/second. If each miss takes 200ms (slow DB query), those 50 misses are dominating your p95/p99 latency even though your average looks fine. Look at:
- Latency distribution, not just averages. p99 tells the story, not p50.
- Are misses on specific keys? (Hot miss pattern — new content, cache eviction of specific keys)
- DB query performance on cache misses. Fix slow queries even if they’re infrequent.
- Thundering herd on misses. Multiple requests simultaneously miss the same key, all hit the DB.
- Network latency to Redis. If Redis is in a different AZ, add that to your analysis.