Chapter 8 - The death of the aggregate

continues from Chapter 7 - Focus on the behavior

......

Event sourcing provides a huge advantage, because it decouples the persistence from the model needed for making a decision. Any message handler can build, on the fly, any model needed for making a decision, starting from the correct event stream.

Let's make a practical example to understand better. Assume we need to handle the UpdateCourseCapacity Command . 

This is a representation of the Event Store, where the events belonging to the course aggregate are highlighted in green.

If we decide to use the classic aggregate approach, we need to reconstruct the Course aggregate starting from the stream of its events. Let's take a different approach instead, where the command handler is responsible for retrieving the model needed to make a choice. We immediately realize that not all the events that are part of the Course Aggregate contain information useful for our purpose.

For example, the fact that the course has been renamed is not interesting in handling the capacity update request.
The command handler could only fetch the events it cares about, which are only two:
  1. the creation event, necessary to verify that the course exists
  2. the CourseCapacityChanged event, since the new capacity must be different from the previous one
completely ignoring the CourseRenamed Event.


For supporting this use case, an Event Store should provide an API  to stream the events starting from an arbitrary query, and not only by an aggregate identifier. 

events = stream(StreamQuery query)

The stream query is composed of two basic elements.

The first consists of a filter on DomainIdentifiers.

The DomainIdentifier, as the name suggests, represents the identifier of a given instance of a business concept. To simplify its understanding, we could agree that a DomainIdentifier is represented by the pair formed by the business concept's name combined with the specific instance's identifier.

The second part of the query consists of the filter on the types of events of my interest.

By combining the filter on the DomainIdentifiers and the one on the type of events, the decision block can very punctually obtain the stream of events necessary to reconstruct the model necessary for making its own decision. Of course, depending on the implementation, both of these filters can support any type of wildcard.

Compared to the aggregate solution, the first advantage of this approach is the less waste of resources, in terms of bandwidth to transmit useless events, but also of computational load to reconstruct a state larger than strictly necessary.

The second advantage is that you don't need to worry about modeling the aggregate. The message handler knows exactly what it needs to load in order to make a decision. It is no longer strictly necessary to start with an upfront modeling phase to define the boundaries of the aggregates.

A third advantage is given by the fact that the query is dynamic, and this guarantees greater flexibility.
If the requirement changes and I need new information, it could be as simple as modifying the query, for example, including a new event type.

Let's now analyze what happens in the writing phase if two decision blocks are triggered at the same time.

Let's assume we receive the command for updating the capacity of a course (UpdateCourseCapacityCommand) and the one for updating the title of the same course (RenameCourseCommand) at the same time. In this case, our instinct immediately tells us that there can be no conflict between these two operations since they act on two unrelated sets of information.

The two decision blocks can simultaneously load their respective event streams, rebuild their respective state models, and finally publish their respective events.


What happens if the system receives, at the same time, the command to subscribe a student to a course (SubscribeStudentToCourseCommand) and the command to update the capacity (UpdateCourseCapacityCommand)  of the same course?

This case is different. This time the risk of collisions exists.
Both command handlers can simultaneously load the event streams to reconstruct their models.

Let's assume that the first block capable of publishing an event is the command handler for the capacity update, which publishes the CourseCapacityChanged Event. A few milliseconds later, the other command handler publishes the StudentSubscribedToCourse Event.



In this case, however, it could happen that the CourseCapacityChanged Event, emitted just a few milliseconds before the StudentSubscribedToCourse Event, could invalidate the decision made about the subscription. In fact, the last CourseCapacityChanged Event had not been considered when the command handler rebuilt its model.


How do we handle this situation?
Here I am going to introduce the second important ability that an event store could provide. Alongside the simpler append operation, which receives as a parameter the events to store:

append(Event[] events)

let's introduce a new operation, the conditional append:

append(Event[] events, StreamQuery query, EventIdentifier lastEvent)

How does conditional append work?

It's very simple. Besides the events to be appended, the conditional append requests two additional parameters:

  1. the stream query used to rebuild the decision model
  2. the identifier of the last event used to rebuild the decision model

What are these last two parameters for?

They are used to verify that, when I invoke the append, there is no new information that could affect my decision. In other words, they are used to verify that there are no new events, matching the query, subsequent to the last one on which my decision is based.

If the identifier of the last event of the query does not match the identifier received from the decision block, it means that the decision taken is potentially wrong because it is based on outdated data. I may have decided to accept a student's subscription when, a moment before, the course capacity had been reduced to the point of preventing that subscription.

In this case, the append is rejected as the condition is not satisfied. The decision block can then decide to fail or to try again by re-loading the model, which will necessarily be more updated than before.

The conditional append is our consistency warranty: the event(s) are appended only if there is no event, matching the query, that has been appended after the last event used to make the decision to publish the new event(s). 

The conditional append represents the guarantee that the decision has been made on the basis of the most up-to-date data.

Let's go back to the practical example. The first block to invoke the conditional append is the UpdateCourseCapacityCommand handler. Since the last event it loaded was number 592, the Event Store checks that, at the time of the append, there are no events after 592 that would have been part of the query. Since there is none, the append is accepted.

When the SubscribeStudentToCourseCommand handler tries to publish its event, it also invokes the conditional append. It also loaded event number 592 as the last event. The event store checks that, at the time of the append, there is an event, the 593, following the 592, which is part of the query stream. In this case, the event store rejects the append.

The fourth advantage we obtain over the aggregate is less contention, and - this time - it does not involve any counterpart in terms of complexity. The principle is very simple. The minimal stream consisting of only the events necessary to handle a given message, being precisely minimal, has less probability of colliding with the publication of other events.  Before the contention boundaries were those of the aggregate. Now, we can limit the contention by reducing the consistency boundary to a single stream.

The story continues to the next chapter
Chapter 9 - The Event is just a fact, pure.

... or watch the full story on youtube

Comments

Popular posts from this blog

Chapter 1 - I am here to kill the aggregate

Chapter 2 - The Aggregate does not fit the storytelling