If I had a dollar for every time that I came across a bug with an implementation of caching in a software system… I would probably have enough to pay for an annual corporate subscription of Redis Enterprise.
Caching, it seems, is one of those things that you can get almost right, but never quite right. That's for a good reason. After all - caching (or cache invalidation rather) is one of those two foundational problems considered to be the toughest in computer science. The other one being the naming of variables, of course.
Jokingly or not - caching is indeed difficult to get right - especially in large scale distributed applications. As a result, teams often go through a process of iteration and experimentation to tweak their caching strategy and implementation - until hopefully, at some point, they get it to some reasonable and semi-optimal state.
In this article, I want to demystify and clarify some of these aspects of caching that are often missed or misunderstood.
Hopefully, after reading this, you will have a clearer understanding of what caching is, the main approaches to caching, things to watch for, and the application of various caching techniques in real-world use cases.
So without further ado….
What is Caching?
Caching, in a nutshell, is the action of storing data in a temporary medium where the data is either cheaper, faster, or more optimal to retrieve rather than retrieving it from its original storage (system of record)
In other words, imagine the following use case.
An order management system needs to retrieve product information from an inventory system. Let's say that the inventory system is not very nor performant. Every time it gets a request, it has to go to a centralized database to get information about products. That database is slow and cannot support too many parallel requests.
In order to improve the performance and alleviate the strain on that inventory database, we introduce a caching layer where we will now store the same product information. Except that now, instead of going to that inventory system with the clunky database, we first go to the cache, and if the data is in the cache, we get it from there.
What we've done here is introduce a temporary storage medium (the cache) to improve performance and to optimize the resource usage of the original inventory database.
What Constitutes a "Cache"?
One point where folks start getting confused is regarding the technical nature of the cache.
Most of us in the software development world have very specific associations when we hear the term "cache". We often associate that word with distributed cache products such as Redis or Memcached or EHCache. At other times, we think of the browser cache, database caching, OS caching, or even hardware caching.
That's exactly the point. The concept of a cache is not limited to a particular product or sphere within the field of computer science. In its broadest meaning, "caching" is really any type of temporary medium where we duplicate data from some system of record. We do so because it is in one way or another advantageous to store that data in the temporary medium.
Typical reasons for this are due to lower cost, better performance, or improved scalability of the cache over the original storage.
If we look at the previous example of the order management and inventory system, the caching layer could in theory be a number of things:
Distributed caching product (Redis, for example)
Another microservice with its own database
In-memory storage within the actual inventory management system
All of the above would fulfill the criteria of being a cache even though each implementation would be different from one option to the other.
Put simply - all of those things above can be a cache. Caching, as a concept, can be and is in fact implemented on all levels of the computer systems stack and across many digital domains.
Some Terminology
Before we continue, it's important to understand the different terms surrounding the topic of caching.
System of Record: The permanent storage where data is stored. Most likely a database. Also called the source-of-truth system.
Cache Miss: When the application queries the cache but that particular record does not exist in the cache.
Cache Hit: When the record does exist in the cache and is returned as such.
Cache Pollution: When the cache is filled with values that are not used or queried.
Cache Eviction: The process of removing entries from the cache to free up its memory.
Data Freshness: How in-sync the records in the cache are with the underlying system of record.
Cache Expiration: Time based removal of cache records as part of the eviction process or as part of cache invalidation as we will discuss below.
Now that we are all fully versed in the caching lingo let's dive into some of the places and layers where a cache may be implemented.
Where is Caching Implemented?
As we already touched on, caching is used across the entire technology landscape - on all levels and within many different technology stacks.
On a hardware level, caching is used as part of CPU architecture, for example, in the form of a Level 1-3 (L1/L2/L3) cache.
On OS Kernel level, there is a form of Disk Cache known as a Page Cache. There are other forms as well.
With Web-based systems, there are of course browser caches and CDNs (Content Delivery Networks). This cache commonly used static resources (images, stylesheets, etc) on either the client or CDN side respectively. The idea is to reduce bandwidth and to serve this assets to the user quickly, efficiently, and at less cost.
Different types of applications and middleware have their own caches as well. For example, databases use caching to save frequently used queries as well as frequently returned result sets.
Also, there are of course many robust software caching products such as Redis, EHCache, Memcached, Hazelcast, Infinispan and others that allow for scalable distributed caching within a distributed application.
Now, one point to emphasize is that the concept of a "distributed" cache can also be contrasted against a "local" or "localized" cache. Distributed cache is a form of cache that is spread out across multiple devices over a network. Local cache exists on one device only.
The best way to understand the difference between the two concepts is to imagine an application being deployed in a cluster of servers. In other words, there are multiple instances of the application running at the same time, which is something that is familiar to anyone working on large scale applications.
If we were to introduce a distributed cache to such a system, any of the application instances would have access to that cache and an ability to modify its records.
With a local cache on the other hand, each instance would have its own cache - most likely within the memory of that particular instance. Different instances would not have access to another instance's cache. They would only be able to access their own cache.
There are pros and cons with both of these approaches.
On the one hand, if you have multiple instances accessing a cache - you may need to address sync issues, race conditions, data corruption, and other challenges that come with a distributed application. On the other hand, having a shared cache is a powerful concept because it allows the application to handle use cases that would not have been possible with a local, albeit simpler, cache.
For example, you might have your application deployed in a cloud environment within multiple availability zones. Each availability zone might have a cluster of VM instances running your application. Each of these clusters will most likely have its own distributed cache. The reason being is that one of the premises of a distributed cache is being able to access it quickly and efficiently. This means having network proximity (not necessarily physical but virtual as well) of the cache to the instances that it serves.
At the same time there are some challenges that are common to both distributed and local caching.
The main challenge is of course the constant balance between maintaining data freshness, optimal cache invalidation and eviction, as well as fitting the way you manage your cache with your particular use case.
This very important concept of managing cache - caching patterns is what we will tackle next.
As with most decisions in software engineering, each approach has its own trade-offs - or in other words, pros and cons. We will address the pros and cons of each approach below.
Caching is a key concept to understand for software engineers and software architects alike. However, it is far from being the only such concept.
In my guide - Unlocking The Career of Software Architect , I describe what other concepts, technologies, and skills are required for Senior+ software engineers and software/solutions architects to master.
Check it out here
Patterns in Local and Distributed Caching Systems
There are five main caching patterns, and they all have to do with the way that the cache is read from, written to, and synchronized with the underlying system of record.
Cache-Aside
The Cache-Aside caching strategy is probably the most popular and the one most software engineers are familiar with. This caching approach puts the control over cache writes and reads entirely on the application. Here, the application controls both when to read from the database or cache as well as when to write to them.
Here's a breakdown of how that works by way of an example.
Imagine that your application receives a user's login request and consequentially, fetches the user's mailing address.
The application first checks if the user's address exists within the cache.
If there is no address entry for that user, the application will retrieve the data from the database.
If the information exists within the cache, however, that data is immediately retrieved and thus saving us a trip to the database.
After getting the new information, the application will also write that data to the cache.
In step 2, if there is no entry in the cache for that particular item - that is frequently referred to as a "cache miss".
Pros
Simple to implement
Control remains entirely with the application
Uses the least memory (at least in theory) as cached items are only fetched when needed (lazy loading)
Cons
Higher latency on cache misses because the data must be fetched from the slower store. Too many cache misses, and performance may suffer.
Application logic becomes more complex (even though the overall idea is simple to implement)
When to Use
When you want to have full control over how the cache is populated.
When you do not have a caching product that can manage database reads/writes.
When access patterns to the cache are irregular
Write-Through Caching
Write-Through caching ensures consistency between the cache and the underlying persistent data store. In other words, when a write happens, it is propagated to both the cache and the database in the same transaction.
Here is an example to illustrate:
Financial application receives a request to update a user account with a new balance.
The user account balance exists in both the database and cache.
Both the database and cache are updated with the new value within the same transaction.
Another request comes along, this time to read the user's balance. We look in the cache first and use that value. Since the cache has the most up to date value, there is no concern that the value might not be in sync with the underlying database.
Note that step 3 can be done via the application logic. However, frequently, the actual caching product will have that responsibility. For example, if you are using EHCache or Infinispan, then the application will update the Redis cache, which can be configured in turn to update the database.
Pros
Ensures consistency between the cache and the underlying datastore
Cons
Transactional complexity as we now need some kind of a 2-phase commit logic to ensure that both the cache and the database update (if not controlled by the cache)
Operational complexity, if one of the above fails, we need to handle the user experience gracefully.
Writes become slower since we now need to update two places (cache and datastore) as opposed to just one (datastore)
When to Use
Write-through caching is ideal for applications that require strong data consistency and cannot afford to serve stale data. It's commonly used in environments where data must be accurate and up-to-date immediately after it's written.
Write-Around Caching
This strategy populates the underlying store but not the cache itself. In other words, the write bypasses the cache and writes to the underlying store only. There is some overlap between this technique and Cache-Aside.
The difference is that with Cache-Aside, the focus is on the reads and lazy loading - only populating the data into the cache when it is first read from the datastore. Whereas with Write-Around caching, the focus is on write performance instead. This technique is often used to avoid cache pollution when data is being written often but is infrequently read.
Pros
Reduces cache pollution as cache isn't populated on every write
Cons
Performance suffers if some records are often read and would therefore benefit from being loaded into cache proactively to prevent a trip to the database on first hit.
When to Use
This is often used when write volumes are large but read volumes are significantly lower.
Write-Back (Write-Behind) Caching
Write operations first populate the cache and then are written to the datastore. The key here is that writing to the datastore happens asynchronously - therefore doing away with the need for a two-phase transaction commit.
The Write-Behind caching strategy is usually handled by the caching product. If the caching product has this mechanism, the application will write to the cache, and the caching product will then be responsible for sending the changes down to the database. If this is not supported by the cache product, the application itself will trigger an asynchronous update to the database.
Pros
Writes happen faster because the system has to only write to the cache within the initial transaction. The database will be updated at some later point.
If the flow is handled by the caching product, then the application logic becomes less complex.
Cons
There is a potential for inconsistency as the database and cache will become out-of-sync until the database receives the new changes.
There is a risk of there being an error when the cache eventually attempts to update the database. If this happens, there would need to be more complex mechanisms in place to ensure that the database receives the most up to date data.
When to Use
Write-behind caching can be used when write performance is critical, and it is acceptable for the data in the database to be slightly out of sync with the cache temporarily. It is suitable for applications with high write volumes but less stringent consistency requirements. One example of where this may be used is CDNs (content delivery networks) to update cached content quickly and then sync those to the system of record.
Read-Through
Read-through caching is similar to the cache-aside pattern in a sense because in both, the cache is where we look first for the record. If there is a cache miss, we look in the database. However, while the cache-aside pattern puts the responsibility of querying both the cache and the database on the application, with read-through, that responsibility is on the caching product (if it has that mechanism)
Pros
Simplicity - all logic is encapsulated in the caching application
Cons
Potential latency when reading the data from the database on a cache miss. Requires complex invalidation mechanisms for data updates.
When to Use
Read-through caching is used when you want to simplify the code that accesses data. Also, when you want to ensure that the cache always contains the most recent data from the data store. It's useful for applications that read data more frequently than write data. The key point here though is that your caching product should have the ability through configuration or natively to perform these reads from the underlying system of record.
Summary of Caching Strategies
The table below summarizes what we have talked about in terms of the five caching patterns.
Strategy | Description | Real-World Example | Responsibility for DB Operations |
Cache-Aside | Data is loaded into the cache only on demand when an application requests it and doesn't find it in the cache. | An e-commerce website caching product details on demand. | Application |
Write-Through | Every write operation is simultaneously written to both the cache and the underlying data store to maintain consistency. | Banking systems for consistent account balances across transactions. | Caching Product or Application |
Write-Behind (Write-Back) | Writes are first recorded in the cache and later written to the data store asynchronously. | CDNs updating content in cache first and syncing to storage systems later. | Caching Product or Application |
Write-Around | Write operations bypass the cache and directly update the data store thus avoiding caching data that is not immediately needed. | Logging operations where logs are written directly to storage without caching. | Application |
Read-Through | The cache acts as the primary interface for reads. If data is missing in the cache, it is fetched from the system of record and is cached. | User profile service fetching and caching user data on cache misses. | Caching Product or Application |
Cache Invalidation
Now that we understand the different ways to populate a cache, we need to also understand how to maintain it in sync with the underlying system of record.
When it comes to cache invalidation, the two main approaches are time-based and event-based. The time-based approach to invalidation can be controlled by time-to-live (TTL) settings available in most caching products. The event-based approach requires the application or something else to send the new records to the cache.
The thing about a data cache is that it is almost always at least slightly out of sync with the underlying datastore (system of record). In other words - it becomes stale. In order to maintain the cache as synchronized as possible with the system of record, we need to implement some sort of a cache invalidation strategy.
In other words, we need to ensure data "freshness" within the cache.
Cache invalidation causes new records to be retrieved from the system of record and into the cache. So it's extremely important to understand the relationship between cache invalidation and its relationship to the caching strategies we have discussed above.
Caching strategies have to do with how data is loaded and retrieved from the cache. Cache invalidation, on the other hand, has to do more with data consistency and freshness between the system of record and the cache.
As such, there is some overlap between these two concepts and with some caching strategies, invalidation will be simpler than with others. For example, with the cache-write through approach, the cache is updated on every write, so that is not something you would have to additionally implement. However, deletes may not be reflected and so may require application logic that deals with these explicitly.
Ways for Cache Invalidation
There are two ways to invalidate a caching entry:
Event-Driven
With an event driven approach, your application would notify the cache every time that there is a change in the underlying store of record. Every time that a record changes, you would trigger a notification to the cache - whether synchronous or asynchronous.
This can be done via the application where your code would be responsible to keep the cache up to date. Or - with some caching products, there may be pub/sub functionality where the caching product can subscribe to these types of notifications. In that case, there may be less work to do with the application. However, something would still need to produce these notification events.
Time Based
With a time based approach, all cache records would have a TTL (time to live) associated with them. After the TTL expires for a record, that cache record is deleted. This is typically controlled by the caching product.
Cache Eviction Strategies
Cache eviction is similar to cache invalidation in the sense that in both cases, we are removing old cache records. However, there is a difference in that cache eviction is required when the cache becomes full and is unable to accommodate any more records.
Remember, the purpose of a cache is to store a subset of the most frequently accessed records. It is not to duplicate the entire source of truth system. Therefore, the size of your cache would typically be orders of magnitude smaller than the size of the data stored in your database / source of truth / system of record.
As such, we need a mechanism by which we can "evict" or in other words - delete - records.
At the same time, we need to ensure that we start with those records that are the least likely to be needed by the application - or else the entire point of having a cache is meaningless.
To ensure that we are evicting records in a way that is optimal, there are a number of eviction strategies we can leverage:
Least Recently Used (LRU)
With this approach, we evict records that haven't been used in a while.
When to Use: Effective in scenarios where the likelihood that data will be accessed soon decreases with time since the record was last accessed. This is a good fit for general-purpose caching where recency of access is a strong indicator of future access.
When Not to Use: Not ideal for workloads where data access patterns do not correlate with recency.
First In First Out (FIFO)
Evicting those records that were saved in the cache before other records.
When to Use: Useful for caches where the age of the data matters more than the frequency or recency of access. Fit for caching data with a predictable lifespan.
When Not to Use: Suboptimal for workloads where older data might still be frequently accessed.
Least Frequently Used (LFU)
Evicting those records that are not frequently used/accessed.
When to Use: Best for situations where data that is accessed frequently over a long period should be retained. Suitable for applications with stable access patterns.
When Not to Use: Less effective in environments where the access patterns can shift significantly. What happens is that infrequently accessed items can pollute the cache.
Time To Live (TTL)
Evicting based on a predetermined time-to-leave period.
When to Use: Ideal for data that becomes obsolete or stale after a certain period.
When Not to Use: Not suitable for data whose validity does not naturally expire over time and needs to remain in the cache indefinitely based on other factors.
Random Replacement
Evicting records randomly.
When to Use: Can be used in situations where the cost of sophisticated tracking mechanisms outweighs their benefit, or where access patterns are unpredictable and so other eviction strategies are not a good fit.
When Not to Use: Generally less efficient than other strategies in most practical scenarios where there are more or less predictable access patterns.
Summary
We've talked about the importance of caching in distributed applications and the criticality of choosing the right caching strategy. There are a number of popular strategies:
Cache-aside
Write-through cache
Read through cache
Write behind cache
Write Around
We also talked about cache invalidation using a time-based or an event driven approach.
We noted why cache eviction is important and what strategies are available to do that. These are:
LRU
FIFO
LFU
TTL
Random
Lastly, a cache can be either local or distributed. The former is confined to a single machine/application instance. The latter spans over multiple machines, and is typically (though not necessarily) confined to a cluster of instances.
Hopefully, this piece has provided you with some clarity on what caching is, why it is important, and all of the different terminology and nuances that are crucial to understand when working with a caching technology.
Caching: The Future
As with other technologies, there is a tremendous innovation happening with caching products in the market. Some notable highlights are:
Integration with Edge Computing
As edge computing continues to grow, caching strategies become more decentralized, moving data closer to where it is needed at the edge of networks. This proximity reduces latency, bandwidth, and cost of serving data. It is crucial for real-time applications such as IoT and mobile apps.
Example: Autonomous vehicles use edge computing to process real-time data locally. Caching key data like maps and traffic conditions at edge nodes helps in rapid decision-making without the latency of querying central servers.
AI and Machine Learning-Driven Caching
AI and machine learning can enhance caching mechanisms by predicting data usage patterns and preemptively caching data based on anticipated needs. This kind of a proactive approach can significantly enhance efficiency - especially in dynamic environments where data access patterns frequently change.
Example: Amazon uses machine learning to predict user behavior and pre-cache products that users are likely to buy during peak times such as Black Friday. This enhances user experience by decreasing load times.
In-Memory Data Grids (IMDG)
IMDGs are quickly evolving as a powerful solution for caching that provides low-latency complex data access across distributed systems. IMDGs not only cache data but also provide a range of data processing capabilities, real-time analytics, and decision-making directly within the cache layer.
Example: High-frequency trading platforms utilize IMDGs to cache market data and trade orders in-memory. This allows for quick access and processing, which is crucial for making sub-second trading decisions.
Hey, I’m Yakov, and I run CloudWay Digital Inc — a software architecture consulting agency, and Developer.Coach — where I help software engineers, and architects to level up in their careers.
In addition to my free articles here and on Medium, I've also written a number of guides to help software engineering professionals level up in their career. Check these out here:
Comments