One of the great benefits of Event Sourcing is that you don't lose any business data. Each business operation ends with a new event appended to the event store.
The business object is represented by the sequence of events called a stream. When we want to execute business logic, we're reading all events from a specific stream. We're recreating the current state by applying all the events one by one in order of appearance. Based on the current state, we verify and execute the business logic.
But isn't loading more than one event a performance issue? Frankly, it's not. Downloading even a dozen, or several dozens of small events is not a significant overhead.
Events are concise, containing only the information needed. EventStoreDB is optimised for such operations, and the reads scale well.
Still, you can't disagree that loading a few events will take longer than loading a single one, this is where snapshotting can help.
In Event Sourcing, snapshots are used when the number of events that need to be replayed to restore the state of an aggregate (the logical unit of data) need to be reduced.
Snapshots are a way of storing the current state of an aggregate at a particular point in time, and can be used to skip over the previous events when loading the aggregate. This can help improve the efficiency and performance of an event-native application.
Let's look an example to explain this concept in more detail.
Suppose I opened a bank account at the age of 18. Let's assume that I was making three transactions a day. If we multiply these numbers (3 x 17 x 365), we get 18,615 transactions.
If we follow the Event Sourcing pattern literally, we'd need to get all these transactions to calculate the current account's balance. This won't be efficient. Your first thought to make this more efficient may be caching the latest state somewhere.
Instead of retrieving all these events, we could retrieve one record and use it for our business logic. This is a snapshot.
Another way to describe a snapshot is by using the cash register example.
Is the balance in the cash register calculated based on all transactions since the shop was created?
No.
Usually cashiers create summaries at the end of their shift. They verify whether the state in the POS system is consistent with the actual amount of money in the cash register.
The following employee starts a new shift, at the end of which a separate summary is made. It is the same in a bank account. The billing data is opened and closed in a regular cycle. Old data is archived, and we start again with the summarised balance.
Assessing when to take a snapshot is essential. Popular tactics are:
The need to use snapshots may hint to the model's design flaw.
Snapshots can be used as a tactical hotfix or optimisation. Adding them should not stop us from evaluating the design correctness. We should plan it, not just satisfy ourselves with the quick win.
We must also remember that we will run into a versioning problem when using snapshots. Our object will live long (as we have not shortened its lifecycle), so the risk of changing the schema is greater.
When we change our business object structure, we'll need to perform a data migration. As you probably know, it's always complicated, and can go even worse if we use snapshots as the read models. It's pretty tempting to do that when we have the latest entity version stored.
Write and read models tend to evolve as the time flow becomes more distant. Snapshots are an optimisation technique for the write model. It's purely technical. If we conflate that with other aspects like read models, we're introducing coupling that may be hard to untangle.
That is only limited by your imagination and the technologies used in your project. You can save them, for example, as:
Using cache or in-memory storage provides the option of setting the maximum lifetime (TTL). We can easily define that the snapshot will live only one day. Then cache will be invalidated. It helps in reducing the need for migration. However, after the snapshot was removed from a cache, we need to rebuild it again. Also, my running joke is: If you solved your problem by using a cache, you usually have two issues afterwards.
Our blog, Snapshot Strategies, goes into the storing on snapshots in more detail and outlines implementation with code examples.
Each data storage model has its specifics. Relational databases have normalisation. Document databases are denormalised. Key-value stores have strategies for keys definition. Event stores also have their specifics.
Traditionally, we do not pay much attention to the number of operations made on a business object. All of them will be condensed to a single record. In Event Sourcing, thanks to the history of events, we gain auditability and diagnostics. We also have an additional modelling aspect to consider explicitly: lifecycle over time.
Let's get back to our shopping example. Instead of modelling our stream as all the events happened for the specific cash register (e.g. transactions), we could break it down into smaller, shorter-lived entities. For example:
If we ask "business", it may turn out that such a break-down reflects the reality. "Closing the books/end of business day" is a typical pattern for many industries. Very often, our technical assumptions are an oversimplification. That is why it is worth digging down and asking business to bring problems, not solutions.
By modelling the stream as the events on a given cashier's shift, we can simplify the solution. Streams will contain fewer events.
The stream's lifecycle affects not only the performance, but most of all, it is easier to maintain. I wrote about it in How to (not) do event versioning.
If our stream is short-lived, schema versioning will be easier. We rarely care about records that are deleted or archived. Therefore, when we deploy new changes and have events with the old schema, we will have to support them as long as their streams exist.
Thanks to this, we can break our deployment into "two steps". First, we deploy a version that supports both schemas and mark the old one as obsolete. Then when all streams with old schema are not active, we can remove the old code and create a new version.
If you're going to use the "closing the books" process, how should you approach it?
It can be artificial sometimes. If we were to break our stream to reflect each work hour, but it may turn out that it does not reflect the actual business flow. Streams that are too small also cause a more significant management overhead. If we add tight performance requirements, we may need to cut each potential overhead.
In this situation, snapshots can help. However, I would suggest that you treat it as a last resort when nothing else helps.
My advice, then, is to avoid snapshots whenever you can.
If you need to use them, it is worth going back to the drawing board and analyse your solution. It may appear that working together with business can enhance your model and reduce the need of using snapshots. It's always worth double-checking that and asking more whys.
If you have no choice, use it. However, it will not be painless. It brings a lot of accidental complexity. Treat it as a tactical optimisation, not a long term strategy. And remember:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.".
Read Snapshotting Strategies for detail on how to implement snapshots along with code examples.