Due to the recent increase in popularity of Kubernetes and its excellence for the hosting and orchestration of stateless workloads, we regularly encounter the request to expound upon Event Store’s suitability for deployment via the purported king of cluster schedulers.
To summarize, we recommend for reasons we will describe that, to get the most out of your Event Store database deployment, Event Store is deployed via native packages to bare metal systems or isolated virtual machines.
We understand that Kubernetes is being sold as a one size fits all solution for orchestration problems, and stateful sets would seem to be the answer, however, there are requirements for the orchestration of distributed databases that Kubernetes does not satisfy on its own. There are also several technical properties to consider that may affect the availability, performance, and reliability of your database.
Linux CGroups do not provide perfect isolation for processes utilizing the Linux page cache. Consistent page cache availability is required by databases to ensure read and write performance. As such, database processes should not be co-located with other processes that make heavy use of the page cache. A more in-depth description of the issue may be found here: https://engineering.linkedin.com/blog/2016/08/don_t-let-linux-control-groups-uncontrolled
A stable and low latency network is a requirement for well-performing distributed databases. Cluster membership and consensus for reads and writes will affect performance and availability if related operations must be retried or if they time out. Overlay Container Network Interface (CNI) implementations that rely on encapsulation can add latency to network operations, affecting performance. Additionally CNI upgrades may interrupt network connectivity affecting the availability of the database.
The Container Storage Interface (CSI) specification does not include an Input/Output Operation (IOP) constraint declaration for storage classes to ensure that requested volumes will perform as expected. This is less of a concern for CSIs that target cloud-based persistent volumes as the volume size is generally tied to the IOPs it provides. For on premises CSIs one must ensure that a provisioned volume will guarantee the required IOPs for your workload. Occasionally, volume re-attachment, during process migration, may hang or fail, impacting the availability of the database.
The Kubernetes deployment controller does not provide the means necessary to do a rolling update of a distributed database deployment on its own without incurring some downtime. Knowledge of cluster state is required to ensure that quorum requirements are met in order to keep the cluster in service during an upgrade.
We are committed to ensuring that Event Store is deployable, and runs well on Kubernetes. To achieve this goal, we are working on the following features: