Request-response architectures break at scale. When Service A calls Service B which calls Service C synchronously, you have created a distributed monolith — a system that has all the operational complexity of microservices with none of the independence. Event-driven architecture breaks this coupling by having services communicate through events rather than direct calls.
The core principle: services publish facts about what happened, not commands about what should happen. "OrderPlaced" is an event. "ProcessPayment" is a command. This distinction matters because events allow multiple consumers to react independently without the publisher knowing or caring about downstream behavior.
Core Patterns
Event Notification
The simplest pattern. A service publishes a lightweight event — "CustomerCreated", "OrderShipped" — and interested services subscribe. The event carries minimal data (typically just an ID and event type), and consumers fetch additional data if needed. This pattern is easy to adopt and works well for decoupling, but creates chattiness if consumers frequently need to call back for details.
Event-Carried State Transfer
Events carry all the data consumers need. "OrderPlaced" includes the full order details, customer information, and line items. Consumers maintain their own local copy of the data they care about. This eliminates callback chattiness and makes consumers fully autonomous, but means data is duplicated across services and events are larger.
Event Sourcing
Instead of storing current state, you store the sequence of events that led to the current state. An account balance is not a row in a database — it is the sum of all deposit and withdrawal events. This gives you a complete audit trail, the ability to reconstruct state at any point in time, and natural support for temporal queries. The trade-off is complexity: read queries require replaying or projecting events, and the event store grows indefinitely.
| Pattern | Complexity | Coupling | Auditability | Best For |
|---|---|---|---|---|
| Event Notification | Low | Low | Limited | Simple decoupling between services |
| Event-Carried State | Medium | Very low | Good | Autonomous services with local data |
| Event Sourcing | High | Very low | Complete | Financial systems, audit-heavy domains |
| CQRS + Event Sourcing | Very high | Minimal | Complete | High-read, high-write systems at scale |
Apache Kafka in Practice
Kafka dominates event-driven infrastructure for a reason: it provides durable, ordered, high-throughput event streaming with consumer group semantics that allow independent scaling of producers and consumers. But Kafka is not simple to operate, and the teams that treat it as a drop-in message queue discover its complexity the hard way.
- Partition count is hard to change after creation — size partitions for expected peak throughput at topic creation time
- Consumer lag monitoring is non-negotiable — a consumer falling behind is invisible until it becomes a production incident
- Schema evolution must be managed from day one — use a schema registry with compatibility checks or you will break consumers
- Retention policies determine your replay window — set them based on your disaster recovery requirements, not arbitrary defaults
- Exactly-once semantics require idempotent consumers — Kafka provides at-least-once delivery, your consumers must handle duplicates
The AI Agent Event Bus
AI agents are creating a new demand pattern for event-driven architectures. An AI agent that monitors customer behavior, detects patterns, and takes autonomous actions is fundamentally an event consumer and producer. It subscribes to behavioral events, processes them through a model, and publishes action events. The event bus becomes the coordination layer between AI agents and traditional services.
This works well when agents are reactive — responding to events as they occur. It becomes more complex when agents need to maintain state across multiple events (a customer journey spanning days or weeks) or coordinate with other agents. The emerging pattern is to combine event sourcing for state management with an orchestration layer that manages multi-agent coordination.
Migrating to Event-Driven Architecture
Map every synchronous service-to-service call in your system. Rank them by coupling impact — which ones, if they fail, cascade failures across multiple services?
Do not rip out REST calls. Add event publishing alongside them. Let consumers gradually shift from polling/calling to subscribing.
Establish an event envelope (event type, timestamp, correlation ID, schema version) and a schema registry before the first event is published. Retroactive schema management is painful.
Every consumer must handle duplicate events gracefully. Use deduplication keys or idempotent operations. This is not optional — at-least-once delivery means duplicates will happen.
Distributed tracing across event producers and consumers. Consumer lag dashboards. Dead letter queue monitoring. Without observability, debugging event-driven systems is guesswork.
“The biggest mistake teams make with event-driven architecture is not the technology choice — it is treating events as remote procedure calls with extra steps. Events are facts about the past, not requests for the future.”