We (the creators of Galapagos) strongly believe that an Event-Driven Architecture is the best approach to solve common problems which arise in most of today's companies with some kind of distributed IT, especially when introducing DevOps or crossfunctional teams and reducing or removing central IT Governance functions.
The following is a list of principles we want to enforce in our company to establish such an Event-Driven Architecture. They shall support our overall company goals like cost reduction, reduction of time-to-market, but also ensure a high level of reliability.
We formulated these principles almost completely in a technology-neutral way. Only in some examples or practical additions, we refer to the underlying technology.
Based on these principles, we selected Apache Kafka as the technological base for us, and created Galapagos as the utility to enforce these principles while freeing DevOps teams from working through all of this stuff.
Classic inter-application communication channels in companies usually exchange business Objects, not Events. This leads to data duplication and reduncancy, different levels of "truth" about the very same business object, and unclear responsibilities and ownerships of data.
Application teams tend to get their required data from the first application they can find which provides this data and which is cooperative enough to build them an interface (or giving them information about an existing interface they can use).
This tends to lead to long, complex, intransparent flows of data within the company, and makes changes to business processes hard and error-prone.
By changing the inter-application communication to be based on Business Events, we avoid lots of this trouble. A Business Event usually originates in exactly one application, which in this case is the owner of this Event (but not necessarily of the associated Business Object). Parts (or all) of the associated current state of the business object may still be attached to the Event as the Event Payload, so recipients of the events do not have to look this up in other systems.
To avoid Event Cumulation (i.e. an application has to gather all events for a Business Object type to know the current status) and Business Logic duplication (i.e. an application has to know how to interpret the events to derive the current status of the Business Object), central information-caching applications can be established. These are the natural "owners" of the corresponding Business Object type and can answer all kinds of questions about the Business Objects e.g. via a provided REST API. They collect all corresponding events and "know" how to interpret them.
Although most IT people think in applications, Business Events do not really belong to an application. Instead, they belong to a Business Domain (see Domain Driven Design). Applications come and go, will be replaced, split, whatever. But the Business Events usually exist until some fundamental change in the underlying Business Process is done.
So, Applications do own (generate) Business Events for the time of their existence, but the Event must logically be bound to a Business Domain.
This shall reflect also in the naming of the Event type. An event like Order Received may mean something completely different to the Sales domain than to the Internal Logistics domain. So, when referring to the Event type, the name of the owning Business Domain must be included.
A published Business Event Type must contain all events (event instances) of this Type, or it is not a valid Event Type in the context of these principles. An Event Type Order Received (from the Business Domain "Sales") is not valid if it contains only the Business Events of Orders received via an online channel, but not e.g. the orders received via a callcenter. If an application can provide only a subset, the Event Type must be named accordingly, e.g. Online Order Revceived, to reflect this limitation.
Many IT companies with DevOps or crossfunctional teams tend to delegate coordination of interfaces and their contents to the teams, so two teams A and B have to mutually agree upon interfaces between them for data exchange. A new application C, which e.g. also requires information from team A, then has to make own agreements with this team. It may use the same interface as application B, if approved by team A. Team A then has to take care about communicating changes of this interface to both teams, monitor the interface, adjust it for added loads, handle application downtimes etc...
We think that applications shall instead just publish their Business Events, and all interested applications then can subscribe to these events. Of course, the initial Event Payload will usually be agreed upon with the first subscriber, but due to the focus on the Business Event instead of a technical driver, and by adhering to the rules above, the chances that the resulting published information is generally useful are quite high. Adjustments to the payload can be made later on, but have to adhere to another rule...
With regard to the previous section, most of you know that dealing with interfaces used by multiple parties is quite hard when it comes to changes. In a classic publish-subscribe scenario, chances are high that you don't even know all your subscribers. Even if, with every additional subscriber, coordinating an interface change gets harder and harder.
In our decoupled Event-driven, publish-subscribe based Architecture, this means that all changes to the payload of our Events have to be done in a consumer-compatible way. This e.g. means that a property previously marked as required (in whatever schema language) cannot be removed or be changed in its semantics, as consumers may rely on this property. Additional properties usually can be added without problems, as the (single) provider of the Business Event is in charge to provide this property, and older consumers can (and have to, if marked so in the schema) ignore additional properties.
Technical measures can be taken to avoid having to provide old, wrong, whatever data formats forever. For instance, technical representations of the Business Events (e.g. Topics in Apache Kafka) can be marked as "deprecated" in a central listing, with a reference to a new topic, providing the same Business Event, but with a new payload format. This way, consumers have a given amount of time to asynchronously adapt to the new topic.
In todays enterprise IT infrastructures, often you can find one central team responsible for the Enterprise Service Bus, or the central Messaging infrastructure. Teams wanting to publish data there, or wanting to receive data from another team via this central messaging, usually have to fill out some kind of form (maybe many pages), pass it to that team, and get the desired communication channels and associated rights from this team. This team usually does not look into why these teams want to share data, or if this data is already available elsewhere, or similar questions.
We think that establishing a communication channel to another team or publishing relevant business events should be the sole responsibility of the involved teams. Usually they have the competence to know what events they require, or what the business events are they can publish. When it comes to sensitive data, e.g. business events containing personal data in the payload, the providing team (in the best case, the Business Owner there) has the competence to determine if a requesting team B should have access to their events and associated data or not, so they should be the approving instance.
With great power comes great responsibility. The power of teams to publish or subscribe to events brings the responsibility to carefully consider the event context, required event payload, and avoid redundancies in the enterprise IT landscape. The teams must provide appropriate information about their events and related decisions in a suitable format, e.g. in a company-wide Enterprise Architecture tool.