Turning the database inside out with Event Store

A few years ago when I was just learning about CQRS and Event Sourcing I read the transcript of a fantastic talk by Martin Kleppman called Turning the Database Inside Out with Apache Samza - which I found mind blowing!

In this article I summarize Martin's main points and make the case that what Martin describes is a description of the CQRS/Event Sourcing view of the world described from a database perspective.

To summarize the most significant point in the article we need to start from the architecture of most modern relational databases (or non relational for that matter). These databases initially write changes to a transaction log (write ahead log, event stream) and then, in the case of a relational database this change will be written to the main database. This main database will keep the latest state and periodically the commit log will be truncated throwing away the changes that have been applied.

With a relational database the changes are applied to some 3rd normal form like data model which is optimized for flexibility and not optimized for either read or write from a performance or ease perspective.

However, what if you considered the transaction log as the core database abstraction? Rather than truncating and deleting the change data, rather you kept it?

If you “unbundle” the transaction log, instead of being bound to a single data model to read data from, you could create read models optimized for usage. Kleppman covers multiple examples, replication, secondary indexing, caching, and materialized view to show how this approach excels at addressing these concerns.

What Kleppman describes in this article is a database oriented description of Command Query Responsibility Segregation (CQRS) and Event Sourcing pattern, often abbreviated CQRS/ES. It represents the larger vision of the target architecture for Event Store databases which are databases optimized for append only writes and publish streams of events that are then “projected” into read models optimized for the particular query scenario.

The diagram below illustrates differences between a traditional “state” based relational model and the unbundled, “state change” based event store. Note that the diagram on the right is not a new invention. It is a depiction of CQRS/ES as it was described by Greg Young as early as 2007.

Turning the database inside out

One key point here is that in an event sourced system the unit of change is a business event. The changes are first class, they are modeled, and the change is wrapped with its business context. This is different from using bolted on change data capture approaches for supporting change streams. Event Streams are much easier to reason about and debug because they represent named events.

In CQRS/Event Sourcing there is a concept called “projections” which is a set of patterns for pulling data from streams into a form that is useful in a particular context. This could be to an in-memory view, another stream, a materialized view, a cache, etc. The event stream is tracked by a monotonically increasing number (a checkpoint in EventStoreDB) that is instrumental in synchronizing persisted read models. A view can subscribe and be updated real time with an event stream and in the case one side goes down it can reliably resynchronize using the location it was at in the latest successful synchronization.

When business requirements change new read models can be created, or existing ones recreated by replaying the streams.

One key point that is not represented in the diagram above but is essential to Event Sourced systems is to recognize that the event store is being used as an operational “source of truth” database. Although it supports a subscription-like API it should not be mistaken for a message processing technology. Operational database technologies should guarantee disk writes and have a concurrency model among other things.

Another more subtle requirement is the ability to have granular streams. In much of the latest literature on event sourcing emphasis has been placed on the read side of event sourcing. Talks/articles on event sourcing (or event driven architecture) often start with the assumption that a stream of events are already in place. However getting events properly into the stream is fundamentally important. They must be sequenced, not duplicated and have high throughput. Business rules for a particular state change must be satisfied. In order to determine whether a state change is valid an entity’s current state must be “hydrated” before checking the business rules for the requested change (there are optimizations but the fundamentals do not change). This requires a stream for each entity and millions or billions of streams is not uncommon.

Event Store was built as an operational stream database specifically to support this unbundling of the event stream as the core “source of truth” data store with stream subscriptions to enable the wide variety of optimized read models.

In summary, one way to look at the CQRS/ES pattern is as a “deconstruction” or “unbundling” of traditional database technologies elevating the transaction log or event stream to the core data abstraction with real time streams of these events available for building optimized read models for querying (or using them directly for real time processing). Martin Kleppmann called this concept Turning the Database Inside Out in his 2014 Strange Loop talk and, at least for me, it fundamentally changed the way I view the future of database technology.

Turning the database inside out with Event Store

Benefits Of Event Sourcing

How Event Sourcing Can Power Machine Learning

Why Event Sourcing? Part 3 - The Core Benefits of Event Sourcing