At Internovus, we always strived to be pragmatic: always do the simplest thing that will do the job. This strategy allowed us to minimize the time to market, optimize costs, and overall keep delivering the ever-changing business requirements. However, one time we missed this target badly. In this post, I want to share the story of one of our subdomains that was implemented with simplistic tools, what it led to, and how this project was saved by applying event sourcing
But first, what is Internovus?
The Business Domain
Internovus was a B2B online marketing company. Its business domain was broad and encompassed the whole marketing flow: Internovus allowed its customers, companies that produced products or services, to outsource all their marketing-related tasks. Each customer would get a tailor-made marketing strategy for its products, optimized advertising campaigns, and even sales agents that contacted perspective leads.
Internovus's principal business objective has always been optimization. We strived to optimize each stage of the marketing process: from acquiring ad spaces with the highest cost/benefit ratio, to assigning leads to agents based on their skills and experience. To fine-tune every aspect of the sales process, we developed our own Customer Relations Management (CRM) system. The CRM ingested leads, assigned them to agents, and orchestrated the whole sales flow.
The sales agents were compensated for closing sales. Initially, these commissions were calculated manually by sales desks' managers, but later on we received the requirement to implement a commissions module to automate this manual labor.
When the sales desks' manager had grown tired of calculating the commissions manually, it became the top priority for the management to automate this process. During all discussions regarding the new module, one statement was consistently repeated: "It's simple!" In fact, there was nothing complex about the manual commission calculation process. Once a month, sales desk managers just entered sales into an Excel spreadsheet and attributed sales agents' commissions by calculating a percentage of each sale.
In other words, we had to multiply each sale amount by a commission percentage and email the monthly payments report to the sales desks' managers.
In addition to "it's simple!" this project had another leitmotiv: "it's urgent!"
Since the requirements seemed simple and we wanted to be pragmatic, we designed a 'pragmatic' solution. The relevant business entities were represented by active record objects. They had no business logic in them, just a bunch of getters and setters for the entities' data. The business logic resided in the controllers that implemented the business logic and orchestrated transactions. To be even more pragmatic, some calculations were handled in database views.
By no means that was an elegant solution. But hey, we just needed to multiply two numbers, so why not keep it simple? Moreover, this simple design achieved the business goal of going live as fast as possible. However, when it went live…
Everybody Hated It!
The module was supposed to make the CRM users' life more comfortable. It had to show the agents their current commissions and offload manual work from desk managers. Unfortunately, nobody trusted the module and its calculations. The sales agents claimed that the commissions were underpaid. The managers complained that the commissions were overpaid.
Software engineers investigating the users' complaints hated the system even more than both sales agents and managers combined. It was almost impossible to trace the system's decisions and to unit test its business logic.
There was truth in the users' complaints. With each bug report, we uncovered more and more tacit knowledge employed in the original manual process. For example, a sale is eligible for a commission only after the payment for it has been received. Moreover, a sale's payment status can change, even in the following months. In such a case, the change in the commission should be accounted retroactively. Etc, etc. Turns out, the business logic was much more complex than just multiplying two numbers…
Of course, we had to address these issues as fast as possible and rest assured the codebase design didn't improve in the process, it made it much worse.
Despite the many issues, there was one category of stakeholders that were ecstatic about the new module: analysts.
The BI and analysis departments absolutely loved the commissions module. The moment it went live, they became aware of the many fine-tuning opportunities the module provided and wanted to optimize the heck out of it. Instead of a fixed commission percentage, they wanted to try out different approaches. For example, they had asked to make the percentage a function of sales' amounts or the number of an agent's monthly sales. Later on, they asked to introduce additional bonus percentages that were unlocked by meeting dynamic sales goals.
The module's simplistic design made it extremely challenging to implement these requirements. The lack of tests didn't contribute to the system's stability or to the trust levels of its users. It was full of bugs, both because of the lacking domain knowledge and because the new features were added in a hurry.
Sooner than later, the project's technical debt has turned into a technical bankruptcy. Despite the engineers' willingness to refactor the codebase into a more appropriate solution, the management's priorities were elsewhere, until one bug changed everything.
The Final Straw
The commissions module depended not only on the CRM's data but also on our customers' internal systems; it had to gather information about monetary transactions and their states.
One day, because of a glitch in a client's system many of its transactions were flagged as declined, and back as approved a few hours later. Then because of another glitch, this time in the commissions module, additional payments were falsely attributed to the agents that closed those problematic sales. Finally, the sales agents fell in love with the commissions module!
And finally, the management understood how badly the module needed refactoring…
Back To The Drawing Board
From the Domain-Driven Design standpoint, we mistakenly categorized the commissions management subdomain as a supporting one. Therefore, we chose a simple solution for the seemingly simple problem. Looking back, we could have predicted that the commissions calculation logic would change and evolve. After all, the company has always aimed to optimize every aspect of the marketing process. Moreover, since the module implemented an accounting model, it would be safe to assume that we would need a consistent way to track the module's decisions.
As time passed, it became evident that not only was the subdomain much more complex than we initially expected it to be, but it also had a direct effect on the company's bottom line. Therefore, it was definitely one of Internovus's core subdomains, and it had to be implemented as such.
The new implementations had to focus on the following needs:
- Accommodate a complex and fast-changing business logic
- Support 100% test coverage of the business logic
- Ability to consistently trace and inspect the system's decisions
- Support BI and analysis by representing the data in multiple models.
All these needs are almost screaming 'event sourcing', and that was the direction we took.
Luckily, as we struggled with the previous implementation, we gained lots of domain knowledge and cultivated a robust ubiquitous language. Coming up with an event-based model was almost a no brainer for us. Moreover, before implementing it, we used the planned commands and events to formulate a comprehensive Gherkin test suite and used it to validate the domain model with business stakeholders. For example:
Scenario: A sale is approved after the monthly commissions were already generated
Given today is 2015/04/10
And a sale of 300 USD was assigned to the sales agent on 2015/02/10
And commissions were generated for the 2015/02 period on 2015/04/01
And the sales was approved on 2015/03/16
When the agent`s commissions are generated for the 2015/03 period
Then the sale`s commission of $15 is attributed to the agent
The above test's statements correlated with the following events and command:
These tests closely followed the bounded context's ubiquitous language and thus bridged the problem and solution spaces gap. Moreover, the business stakeholders were not only able to read and comprehend such tests, but they also helped us revealing wrong assumptions and even proposed scenarios that were missing.
The migration process to the new implementation had to be gradual. First, we had to migrate the historical data into the event-sourced model. Next, we had to run both implementations side by side and gradually migrate the users.
To import data from the old system, we have generated 'synthetic' events, that did not represent the exact life cycle, but could be projected into the original implementation's state representation. This is what Mathias Verraes calls migrating events from a ghost context.
Of course, projecting the old model from the synthetic events didn't work the first time, or the second or the third. Interestingly, the new implementation was the correct one. The insight provided by the event-based model allowed us to uncover multiple bugs that went under the radar in the old module!
After all the issues with both versions were ironed out, and both implementations produced the same state, we gradually migrated the users to the new version. As we finished the migration, a few exciting things have happened:
First, both the agents and their managers stopped questioning the system's integrity. Of course, initially there were some inquiries, but as we were able to prove the system's correctness using the events, it didn't take long for the doubts to stop. Finally, all the users trusted the system.
Second, zero bugs in 5+ years.
The new implementation closely followed the ubiquitous language, and thus, the domain expert's mental models. The event sourcing based implementation allowed us to track each and every decision the system makes. Such granular control over the business logic allowed us to ensure that no assumption is deployed without being confirmed first. Ultimately, the event-based model was met with happy tears in the eyes of our analysts. It provided a level of insight into the business domain that they could only dream about before.
The commissions module's story demonstrates how crucial it is to balance the complexities of the problem and the solution spaces. Many times business people unconsciously downplay a software system's complexity to push the engineers to deliver faster. Of course, this strategy works only in the short term. If your business subdomain deals with monetary transactions, requires in-depth analysis, or has any other direct effect on the company's bottom line, it is probably a core subdomain and should be addressed using the appropriate tools and patterns.