The Failure You Don’t See Coming
The incident starts … innocently.
A hot key expires. The user profile of a celebrity with 10M followers gets cache evicted. Suddenly, thousands of concurrent requests miss the cache and fall through to the database. The DB spikes. Latency explodes. Retries kick in. All of a sudden you are not handling a 1000 requests, you are handling tens of thousands of them.
In a matter of minutes, the connection pool is exhausted and starts throwing
failured to acquire connection with X time. Timeouts start cascading upstream.
What started as a cache expiry is now a full-blown system outage!
Nothing broke.
Nothing changed.
The cache did exactly what you programmed it to do and this is what makes caching deceptively difficult to pull off! It was never just about storing data - it was about controlling inconsistency under traffic.
Mental Model of Caching
I think most discussions about caching are fragmented between what TTL to use,
whether to use redis or memcached and whether to just cache everything.
This approach breaks very early in production because it focuses on tools
and not on the behavior of the data.
A more useful model is to think about caching along two orthogonal axes:
Axis X: Write Path (aka where does truth flow)
Your write path determines how data moves between your cache and database.
On one end of the spectrum you have write-through where every write updates both the DB and the cache in the request path. On the other end, you have write-behind where writes are accepted into the cache and then persisted asynchronously in the DB.
Most real-systems don’t sit cleanly at either of the extremes. There is usually a mixture of strategies:
- Write to DB and invalidate cache
- Write-through for some entities and write-behind for some
- Bypass cache entire for critical paths that demand consistency
While both strategies work on a sunny day, on a rainy day they diverge:
- Write-through risks dual-write inconsistency
- Write-behind risks data loss and reordering
You are not choosing the better method. You are only choosing your preferred failure mode.
Axis Y: Read Path (aka when data stops being valid)
The Interaction Matters
Most cache bugs don’t come from picking up the wrong point on one axis. They come from examined interactions between the two.
For example:
-
Write-through + TTL: You assume fresh data but TTL silently introduces staleness
-
Write-behind + TTL: You now have two independent delays: persistence lag + expiry lag
-
Explicit invalidation + high write volume: Your invalidation system becomes the bottleneck
-
TTL-only + hot keys: Expiration becomes a coordinated attack on your DB
The system behaves perfectly in isolation, but fails under composition.
Questions to ask here would be:
-
Where can I tolerate inconsistency ?
-
How long can inconsistent data live ?
-
What happens if invalidation fails ?
-
What happens if writes succeed but partially ?
Once you start seeing it this way, things like stampede prevention, TTL jitter and
singleflight become implementation details.
Pros and Cons of the Cache Write Strategies
Time-To-Live and Invalidation
At first glance, TTL looks like the simplest solution to cache invalidation. Set an expiry and let time handle the correctness.
# TTL Lifecycle
cache.set(ttl: 2s) -> ValidWindow(2s) -> expire -> cache.setAgain()
In practice however, TTL is not a strategy but a fallback mechanism for uncertainity. You use TTL when you don’t know exactly when data changes or when it’s too expensive to track every change.
What TTL Actually Does
TTL does not guarantee freshness. It guarantees bounded staleness. When you are setting a TTL for 60 seconds, the idea is as follows
# wrong
This data is fresh for 60 seconds
# right
We are willing to serve data that may be upto 60 seconds stale
Where TTL Breaks Down
- Synchronized Expiry
If many keys share the same TTL, they tend to expire together. This creates bursty load patterns:- cache entries expire
- requests spike to DB
- latency increases
- retries amplify the spike
You effectively “schedule” a micro-outage for yourself.
- Hot Keys
A single high-traffic key expiring can overwhelm your system. TTL does nothing to protect against:- stampedes
- thundering herds
- retry amplification
Again, without additional control TTL becomes the trigger for cascading failure.
- Silent Staleness
TTL-based systems have no awareness of when the data changes. If the upstream
data changes immediately after a cache set:
- you serve stale data for the full TTL duration
- no signal exists to correct it early
This is acceptable for some domains like product catalog but unacceptable for pricing changes and permission-altering systems.
Cache Stampedes
# Stampede scenario
Many clients -> Cache miss -> DB (spike)
Cache systems optimize for steady stage but nothing for transition states like
-
Valid -> expired
-
Cached -> recomputed
Every request independently decides to recompute on cache misses. Your cache effectively becomes a load multiplier with a retry storm.
Mitigation Strategies
- Request Coalescing aka Singleflight

One request performs recomputation and others wait for the result. This prevents duplicate work and reduces DB load to a single request. The tradeoff is that the concurrent requests are blocked and latency increases a bit for waiting callers.
However this is the most effective first layer for defense.
- Distributed Locking: Consider the following cache miss pattern
Instance-A: cache miss -> DB
Instance-A: cache miss -> DB
Instance-A: cache miss -> DB
Each instance believes it is responsible for recomputing the value. With distributed locking the recomputation is coordinated across the cluster.
This actually addresses the downside of the singleflight solution because
that is bound to the requests coming to a single instance. If you have 10 pods
of the same process then you are still requesting 10 concurrent recomputation requests.
Another scenario is when you have a distributed cache with multiple nodes,
the recomputation needs to be propagated across all nodes.
- Stale-While Revalidating: Instead of blocking while recomputing cache, you serve stale data and begin a refresh in the background. As a result, there is no request blocking and the latency is stable under load.
The trade-off is that users see stale data and it requires tolerance for this temporary inconsistency. Ideally, you send a message to the client to retry again in 2-3 seconds as that’s your predicted time to finish computation.
If your system is a high-read and low-criticality system then this is the most resilient default setting.
- TTL Jitter: Stampedes after often synchronized If many keys share the same TTL, it creates a coordinated load spike. Adding randomness to the TTL spreads the load.
final_ttl = fixed_ttl(60s) + randomness(0..10s)
This smooths backend load and reduces coordinated expiry. It is a low-effort mitigation strategy to prevent predictable bursts.
- Negative Caching: Not all stampedes arise from existing data, some also come from missing data.
If a key does not exist then every request is a
cache missand hits the DB.
Sending CACHE_NOT_FOUND is an acceptable way of avoiding this issue. You can
use this for user lookups or feature flags. The trade-off is that you require
careful invalidation and you risk caching a temporary absence state.
Some Examples in Production
Example 1: User Profile Service
-
Use a read-through cache with explicit invalidation + TTL
-
Profiles are read heavy, updates are not frequent but correctness matters.
-
A slight staleness window after the update is acceptable for all viewers except the user to whom the profile belongs to. You can set a cookie flag for this particular user’s request to directly hit the DB.
Example 2: Product Catalog in eCommerce
-
Use a state-while revalidating with long TTLs
-
Again the read volume is high and data changes infrequently
-
The difference from previous example is that previously we have to invalidate cache quickly so that it reflects fast. Cache is evicted on write immediately.
-
But here, we need to prioritize serving and then slowly warming up the correctness in the background. There is no cache eviction but overwrite.
Example 3: High-Frequency Counters
-
Use a write-behind cache with batch writes to DB
-
Due to the massive write volume when it comes to counters, the exact real-time value is often not required.
-
Eventual consistency is acceptable. All you need is a flush logic that triggers periodically and must be idempotent to avoid duplicate.
-
Duplicate updates in the case of counters can be a very scary situation
x becoming 4x instead of 2x
Example 4: Auth / Permission Systems
-
Use a short TTL with explicit invalidation strategy
-
Incorrect data in this scenario creates a security nightmare.
-
Systems like this have high cache churn and increase backend load so you need to look for tools and technologies to combat that.
Common Pitfalls to Avoid
- Cache is not a source of truth. The DB is the source of truth.
- Don’t use TTL blindly as it leads to subtle bugs.
- Most systems are not uniform. A small subset of keys dominate traffic. Design for hot keys explicitly.
- Every cache interaction is a distributed system problem and they do happen in prod
- Cache write fails
- DB write succeeds
- Network partition
- Don’t use write-behind caching for critical data. This is how you lose data without noticing.
Conclusion
Before you introduce caches into your system, always make sure to answer the following questions:
-
What is the source of truth ?
-
How much staleness of data can the system allow ?
-
What happens on partial failure ?
-
What happens under load ?
Caching ought not to be treated just as a performance optimization. It’s a trade-off between speed, consistency and failure handling. Most cache-related outages are not caused by the cache itself. They are a byproduct of assumptions about freshness, traffic patterns and failure modes that do not hold in production.
Caching as a first-class citizen of your system absorbs load. As a shortcut, it amplifies failure!