nSkillHub
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Caching Strategies: Placement, Patterns, and Pitfalls

Caching is the single highest-leverage performance tool available — and also one of the most common sources of production bugs. The decision isn’t just “should we cache?” — it’s where, how, and what the consistency implications are.


Cache Placement: Where Does the Cache Live?

Each layer has different latency, scope, and invalidation complexity.

Client-Side Cache

Browser cache, mobile app cache. Controlled by HTTP Cache-Control headers. The cheapest possible cache — zero server load. Appropriate for truly static content (JS bundles, images, CSS). Not appropriate for user-specific or frequently changing data without careful ETag/Last-Modified handling.

CDN Cache

Globally distributed edge nodes (Cloudflare, CloudFront, Fastly). Serves static assets and cacheable responses from a location close to the user. CDN caching can absorb enormous traffic spikes — a viral article getting 10M requests hits the CDN, not your origin.

Key decision: What can you put on the CDN? Anything that’s the same for all users (or can be personalized at the edge via cookies/JWT) and doesn’t change too frequently. Product pages, landing pages, API responses with Cache-Control: public, max-age=300.

API Gateway / Reverse Proxy Cache

NGINX or API Gateway caches responses. Useful when a large percentage of requests ask for the same thing (public API endpoints, rate-limited reads). Shared across all backend instances.

Application-Level Cache

Your service’s in-memory cache or a shared Redis instance. This is where most teams focus — it’s flexible and gives the most control.

Local (in-process) cache: Java ConcurrentHashMap, Caffeine, Guava Cache. Sub-microsecond reads, but not shared across service instances. If you have 10 pods, each has its own copy — inefficient for large datasets. Also invalidation is tricky — you need to handle cache coherence across instances.

Distributed cache (Redis, Memcached): Shared across all service instances. A cache miss or invalidation from any instance affects all. Higher latency than local cache (~1ms vs nanoseconds) but consistent view across instances.

Multi-level caching: Local L1 + Redis L2. Cache popular items in-process, fall back to Redis, fall back to DB. Complex to invalidate correctly — usually only worth it for extremely hot data.

Database Query Cache

Postgres has no query result cache (it was removed — too many correctness problems). MySQL has a query cache too (also removed in 8.0). Most “DB caching” happens in the DB’s buffer pool — keep frequently accessed data in memory via proper sizing.


Redis vs Memcached

This is mostly settled: use Redis unless you have a specific reason not to.

Memcached is marginally faster at pure LRU string cache operations at extreme scale, and it’s truly multi-threaded (useful for multi-core cache machines). But:

  • Redis supports strings, hashes, lists, sets, sorted sets, streams, HyperLogLog, geo-indexes
  • Redis has persistence options (RDB + AOF) — cache survives restarts with warm data
  • Redis Cluster for horizontal scaling
  • Redis has Lua scripting for atomic multi-step operations
  • Redis 6+ is multi-threaded for network I/O

When Memcached still makes sense: You’re in a pure LRU string-cache scenario at extreme scale and have existing Memcached expertise and tooling. Almost no new systems should choose Memcached today.


Cache Patterns: Cache-Aside, Write-Through, Write-Behind

Cache-Aside (Lazy Loading)

The most common pattern. Application code manages the cache explicitly.

READ:
  1. Check cache → hit? return.
  2. Miss → query DB → store in cache → return.

WRITE:
  1. Write to DB.
  2. Invalidate (or update) cache entry.

Advantages: Only caches data that’s actually read. Resilient to cache failures (fall through to DB). Easy to implement.

Disadvantages: Cache miss causes noticeable latency (cache fill under load). Initial cold start hits DB hard. Race condition on write: two reads can both miss, both query DB, one stores stale data.

When to use: Read-heavy workloads where occasional cache misses are acceptable. Most caching scenarios.

Write-Through

Every write goes to both cache and DB simultaneously. Reads are always warm (if the data was ever written).

WRITE:
  1. Write to DB and cache atomically.

READ:
  1. Always hit cache (for recently written data).

Advantages: Cache always has fresh data for recently written records. No cache miss on first read.

Disadvantages: Write latency includes cache write. Caches data that may never be read (infrequently accessed writes still fill the cache). Cache storage must be large enough to hold write-through data.

When to use: Systems where write latency is acceptable and read-after-write consistency matters (user profile updates, settings changes).

Write-Behind (Write-Back)

Writes go to cache first, DB is updated asynchronously.

WRITE:
  1. Write to cache → return success to caller.
  2. Async: flush to DB (batched or periodic).

READ:
  1. Read from cache.

Advantages: Write latency is minimized (cache write is fast). Can batch writes to DB for efficiency.

Disadvantages: Risk of data loss if cache fails before flush. Complex failure handling. Reads might see data not yet in DB. Strong consistency guarantees are hard.

When to use: High write throughput scenarios where some data loss is acceptable (analytics counters, activity tracking, view counts). Almost never for financial or critical transactional data.


Cache Invalidation: The Hard Problem

“There are only two hard things in computer science: cache invalidation and naming things.” The reason it’s hard: distributed systems don’t provide atomicity across a database write and a cache invalidation.

Pattern 1: TTL-based expiry Every cache entry has a time-to-live. After expiry, the next read misses and refills from DB. Simple, safe, but means serving stale data up to TTL seconds.

Right call: Most data is OK to be stale by a few seconds or minutes. Use TTL as your default strategy and reserve event-based invalidation for data where staleness is genuinely harmful.

Pattern 2: Event-driven invalidation On write, publish an event (via Kafka, Redis pub/sub, database trigger) that invalidates the cache entry. Near-real-time freshness.

Risk: Race condition — read → cache miss → DB read → publish event → cache write → invalidation arrives → entry deleted. The refilled entry is immediately invalidated. Under high concurrency this can cause cache thrashing.

Pattern 3: Cache-aside with versioned keys Instead of invalidating, change the cache key (include a version or timestamp). Old entries naturally expire via TTL. Eliminates invalidation races at the cost of more cache memory.

Pattern 4: Read-through with write invalidation Systematic invalidation tied to the write path. Works when writes are serialized through a single service that owns both the data and its cache.


When Caching Makes Things Worse

  • Low hit rate: If your hit rate is < 80–90%, the overhead of cache lookups + misses may exceed the DB savings. Profile before assuming caching helps.
  • Wrong granularity: Caching entire user objects when you only need the name field. Cache bloat → more evictions → lower hit rate.
  • Cache stampede: All TTLs expire simultaneously at scale. Every request misses and floods the DB. Solution: randomize TTL (+/- 10–20% of base TTL), or use probabilistic early expiration (refresh when a small fraction of requests notice TTL is close to expiry).
  • Memory pressure causes evictions: Cache is too small, eviction policy kicks in for hot data. Monitor eviction rate — it should be near zero for important data.
  • Caching mutable data without invalidation: The bug where a user changes their email but the cache serves the old email for 24 hours.
  • Caching at the wrong layer: Adding application cache when the DB query is just missing an index. Fix the root cause.

The 95% Hit Rate Question

“Your cache hit rate is 95% but latency is still bad — what do you investigate?”

A 95% hit rate sounds good, but at 1000 req/s that’s still 50 misses/second. If each miss takes 200ms (slow DB query), those 50 misses are dominating your p95/p99 latency even though your average looks fine. Look at:

  1. Latency distribution, not just averages. p99 tells the story, not p50.
  2. Are misses on specific keys? (Hot miss pattern — new content, cache eviction of specific keys)
  3. DB query performance on cache misses. Fix slow queries even if they’re infrequent.
  4. Thundering herd on misses. Multiple requests simultaneously miss the same key, all hit the DB.
  5. Network latency to Redis. If Redis is in a different AZ, add that to your analysis.