The Story of the Event Store Rust Client

Written by Yorick Laupa | Feb 12, 2021 9:17:50 AM

Before the Rust client became officially supported by EventStoreDB, it was the result of a side-project. Back in 2018, I wanted to learn Rust. At that time, I was busy writing Haskell code for a significant online retailer. I was confident it would never happen at my job, so I needed a plan.

I wanted to have a good grasp of the Rust programming language, not a shallow understanding. My ideal side project should at least address those specific subjects:

Advanced I/0 input operations.
Parsing (Binary or text, it didn't matter).
Multithreading along with concurrency constraints.

Rust is a multi-paradigm programming language designed for performance and safety, especially safe concurrency. Setting those subjects at the core of my side project will put Rust claims to the test.

At that time, My Haskell EventStoreDB TCP client was already four years old. It was a no-brainer, "I'm going to port that code to Rust!", I told myself.

On a technical point of view, it's the perfect project to learn Rust (or any other programming language for that matter). Simply put, an EventStoreDB TCP client is a full-duplex connection where both ends of the line send protobuf messages while being used by different execution threads.

Haskell and Rust are both at the opposite ends of the programming language spectrum. Code in Haskell is about smaller functions glued together with mathematical abstractions that help to reason. You don't have direct memory access, and mutation (dereferencing) is forbidden. On the other hand, Rust forces you to go down the trenches, where you have to think about stack/heap allocation, pointer and thread-safety. Luckily, in both languages, most if not all the heavy lifting is done by the compiler.

The first weeks were tough. The Rust learning curve is steep, not as much as Haskell's but it's a humbling experience nonetheless. Rust's move semantics and ownership system are ruthless beasts for any newcomer. At its core, the Rust TCP client uses Tokio. Tokio is an event-driven, non-blocking I/O platform for writing asynchronous applications. Tokio is a big library, so big that alone, it manages I/O, parsing and the multithreading of the client.

Back in 2018, there wasn't any async/await syntax. At that time, the asynchronous part of the client was not very readable. Finding a right approach that was also satisfying the Rust compiler was difficult. Among all the struggle faced, the top 3 issue that caused me a lot of trouble was:

Parsing of incomplete data (incoming messages from the server).
Seamless (from a user perspective) connection lifecycle management.
Operation/command management.

Haskell has ridiculously intuitive parsing capabilities. You can roll your parsing API without needing any dependency in less than 50 LoC. Incomplete data doesn't leak in the parsing abstraction, unlike Tokio here.

Implementing the connection lifecycle management was hard, but for different reasons. First, at that time, there was no async/await syntax. The code looked needlessly tricky. Second, it's concurrent code. Luckily, that last point resolved itself by getting acquainted with Tokio concurrent data structures.

Minus some Rust specifics, both the Haskell and Rust connection lifecycle management implementations are similar. The differences are at the operation/command management. Back then, the Haskell implementation used a coroutine approach. A coroutine is a computation that can be suspended, resumed and yield results.

data Coroutine k o a where
Yield :: o -> a -> Coroutine k o a
Await :: (i -> a) -> k i -> a -> Coroutine k o a
Stop  :: Coroutine k o a

Based on that coroutine type declaration, a coroutine is abstract over an asynchronous task (k), an output (o). (a) in this case is used as a placeholder for our fixpoint, a fancy name for describing our recursive data structure.

A coroutine supports three operations:

Yield: a value emitted by our coroutine.
Await: a suspended computation. (i) is existentially quantified because its type will be defined at call-site. That asynchronous computation will emit a (i) if successful or branch out otherwise.
Stop: stops the coroutine.

We tie our coroutine abstraction to our Execution abstraction:

data Execution a
= Proceed a
| Retry
| Failed OperationError

Execution allows us to report an operation failure or a retry to the operation manager. There is more to the operation abstraction than what displayed here. If you want to have a deeper look into it, check out this link.

The state of the coroutine was encapsulated. The operation/command manager drives the coroutine and is responsible for serving the right package from the server to the right coroutine. The key strength of that abstraction was composition. It’s easy to create bigger operations/commands out of smaller ones. For example a catchup subscription was indeed the composition of a read stream operation/command, followed by a volatile subscription. Coroutines can be composed both vertically (thanks to their monadic property) and horizontally (like mathematical function composition).

On the other hand, Rust's implementation used a very straightforward approach, similar to a state machine.

let (mailbox, mut input) = create_channel();
let pkg = create_pkg();

bus.send(Msg::Transmit(
    Lifetime::OneTime(pkg),
    mailbox,
)).await;

while let Some(msg) = input.next().await {
    match msg {
        OpMsg::Recv(resp) => // handle server response…
        OpMsg::Failed(error) => // handle external errors...
    }
}

The composition isn’t great, but it’s faster and puts almost no pressure on the garbage collector. Its key weakness is that it’s very error prone as you aren’t allowed to reason with high-level abstractions. Some features like restarting a command from scratch are done at the command level in Rust's implementation while being managed at the manager level in its Haskell’s counterpart.

Writing the operation/command management was challenging because of how verbose my first attempt was. An EventStoreDB command, from a client perspective, can fail but also be retried or sent to a different node if you are in a cluster configuration. Some of those commands are composed of smaller ones like catchup subscriptions. Then, my limited Rust's understanding leads me to multiple hellish move and ownership situations.

Retrospectively, it was a fantastic experience because it deepened my understanding of EventStoreDB even more. I also discovered many design and performance improvements that I backported to the Haskell client later on.

I firmly recommend this experience for anyone interested in learning a new programming language. I'm planning to repeat the same experience but with the ATS programming language in the future. Stay tuned.

View full post