Microservices Patterns: Saga, CQRS, Event Sourcing, BFF, and More
Microservices patterns are the vocabulary of distributed systems design. Knowing when to apply each one — and when not to — separates an architect who reads pattern books from one who’s shipped production systems.
Problem: A business transaction spans multiple services, each with its own database. You can’t use a distributed ACID transaction.
Solution: A saga is a sequence of local transactions. Each step publishes an event or triggers the next step. If a step fails, compensating transactions undo previous steps.
Choreography-based saga: Services react to events — no central coordinator.
1. OrderService: creates order → publishes OrderCreated
2. InventoryService: listens → reserves stock → publishes StockReserved
3. PaymentService: listens → charges card → publishes PaymentCompleted
4. OrderService: listens → confirms order
Failure at step 3:
3. PaymentService: charge fails → publishes PaymentFailed
2. InventoryService: listens → releases reservation → publishes StockReleased
1. OrderService: listens → cancels order
Orchestration-based saga: A saga orchestrator (a service or workflow engine) explicitly coordinates each step.
SagaOrchestrator:
step 1: call InventoryService.reserve() → success
step 2: call PaymentService.charge() → fails
step 3: call InventoryService.release() (compensate)
→ return failure
When to use which:
- Choreography: fewer services, loose coupling desired, simple failure paths
- Orchestration: many services, complex failure compensation, need visibility into saga state
Real pitfalls:
- Compensating transactions must be idempotent. The network might redeliver a compensation event.
- Partial failures are hard to reason about. What if the compensation itself fails?
- Visibility: Where is the saga in its lifecycle? Orchestration is much easier to observe.
- Saga state must be persisted — if the orchestrator crashes mid-saga, it must be resumable.
Tooling: Temporal.io, AWS Step Functions, Axon Framework (Java), Saga state machines in your DB.
Problem: Service A writes to its database AND publishes an event to Kafka. If the DB write succeeds but Kafka publish fails (or vice versa), you have inconsistency.
Solution: Write the event to an outbox table in the same database transaction as the business data. A separate relay process reads unprocessed outbox rows and publishes them.
BEGIN;
INSERT INTO orders (id, status) VALUES (123, 'PLACED');
INSERT INTO outbox (event_type, payload, processed)
VALUES ('ORDER_CREATED', '{"id": 123}', false);
COMMIT;
-- Both committed atomically, or neither committed
-- Separate process (or Debezium via CDC):
SELECT * FROM outbox WHERE processed = false ORDER BY created_at;
-- For each row: publish to Kafka, then mark processed = true
Key properties:
- The business write and event publication are atomic
- At-least-once delivery — if the relay crashes after publishing but before marking processed, it publishes again. Consumers must be idempotent.
- CDC (Debezium) reading the outbox table eliminates the polling relay process — Debezium reacts to the DB change immediately
When to use: Any time you need to reliably publish events that correspond to database changes. Critical for event sourcing, notification systems, and service integration.
Problem: The data model optimized for writes (normalized, transactional) is not optimal for reads (denormalized, pre-aggregated). Complex reporting queries are slow on the write model.
Solution: Separate the write model (command side) from the read model (query side). They can use different data stores, different schemas, even different technologies.
Write side: Read side:
Commands → Events from write side →
OrderService → OrderReadModel (projected view)
(Postgres) (Elasticsearch or separate Postgres table)
Query: "All orders for user X with product details"
→ hits denormalized read model → fast, no joins
CQRS doesn’t require event sourcing, though they’re often used together. CQRS just means: the model you write to is different from the model you read from.
When to use:
- Complex domain with significantly different read and write patterns
- Read performance requirements can’t be met with the write model
- Multiple read representations needed (same data, different views for different consumers)
- Audit/history requirements (pair with event sourcing)
The cost: Eventual consistency between write and read models. When you write, the read model is updated asynchronously — reads may see slightly stale data. Also: two models to maintain, synchronization logic to build and monitor.
CQRS is not the default. Most CRUD applications don’t need it. Introduce it when the read/write impedance mismatch is causing real problems.
Problem: Traditional systems store current state. You lose history — “how did we get here?” can’t be answered.
Solution: Store the sequence of events that led to the current state. Current state is derived by replaying events.
Events (the source of truth):
1. OrderCreated { id: 1, items: [...] }
2. ItemAdded { item: "SKU-999" }
3. Coupon Applied { code: "SAVE20" }
4. OrderPlaced { total: 80.00 }
Current state (derived by replaying events 1–4):
Order { id: 1, status: PLACED, total: 80.00, coupon: "SAVE20", ... }
What event sourcing gives you:
- Complete audit trail — not just current state, but every change and why
- Time travel — replay to any point in time
- Event replay for new consumers — add a new read model (analytics, cache) by replaying history
- Debugging — reproduce any production issue by replaying events
- Decoupling — consumers subscribe to events, not state changes
The costs:
- Complexity. Querying current state requires event replay or maintaining snapshots. Simple “SELECT * FROM orders” doesn’t work.
- Snapshots needed for large event histories — replaying 100,000 events to get current state is slow. Snapshots checkpoint state at intervals.
- Schema evolution is hard. An event in the log from 3 years ago must still be interpretable today. Event upcasting required.
- Not for everything. Most services don’t need this. Use it for domains where history, auditability, and replayability are first-class requirements (financial ledgers, order management, healthcare records).
Problem: Clients need to call multiple backend services. Logic for auth, rate limiting, routing, and request aggregation is duplicated across services.
Solution: A single entry point that handles cross-cutting concerns and routes to backend services.
Responsibilities:
- Authentication and authorization (validate JWT, check scopes)
- Rate limiting per client/API key
- SSL termination
- Request routing and load balancing
- Response caching for GET requests
- Protocol translation (REST to gRPC)
- Request/response transformation
- Observability (access logs, metrics per endpoint)
Tools: AWS API Gateway, Kong, Nginx, Envoy, Spring Cloud Gateway, Traefik.
Gotcha: Don’t put business logic in the API Gateway. It should be routing + cross-cutting concerns. If you’re writing conditional logic based on request body content in the gateway, that logic belongs in a service.
Problem: A mobile app and a web app have different data needs. The web app needs rich data; the mobile app needs lightweight responses. Building one API that serves both leads to over-fetching on mobile or under-fetching on web.
Solution: A dedicated backend service per frontend type — a BFF. Each BFF aggregates and shapes data from downstream services specifically for its frontend.
Mobile App → Mobile BFF → UserService, OrderService (aggregated, optimized for mobile)
Web App → Web BFF → UserService, OrderService, RecommendationService (rich, desktop-optimized)
The BFF is owned by the frontend team. They understand their data needs and can evolve their BFF independently. The backend services remain stable.
When BFF makes sense:
- Meaningfully different data requirements across client types
- Mobile performance is critical (minimize payload, reduce round trips)
- Frontend team velocity is blocked by backend team changes
When it’s overkill:
- The clients have nearly identical data needs
- You have the team budget to own N BFF services (each BFF is an additional service to maintain)
Problem: You need to replace a legacy system (the “monolith”) but can’t do a big-bang rewrite.
Solution: Progressively route traffic for specific features from the old system to the new one. The old system “strangled” as more functionality moves out.
Phase 1: All traffic → Monolith
Phase 2: User auth traffic → New Auth Service; rest → Monolith
Phase 3: Order creation → New Order Service; rest → Monolith
...
Phase N: Monolith retired
Implementation: A facade layer (proxy, API gateway, or feature flag router) sits in front of both systems and routes based on the path, header, or user cohort.
Why it works: Each piece is a small, bounded migration. Each piece can be tested and validated independently. Rollback is flip the router back. No big bang cutover risk.
Problem: Cross-cutting concerns (service discovery, mTLS, retries, metrics) are implemented in every service, in every language. Changing the retry policy requires updating 50 services.
Solution: A sidecar proxy runs alongside each service container. The proxy intercepts all network traffic and handles cross-cutting concerns transparently.
[Service Pod]
├── App container (your code)
└── Envoy sidecar (handles mTLS, retries, circuit breaking, telemetry)
Service mesh (Istio, Linkerd): Orchestrates all sidecars with a control plane. Policy changes propagate to all sidecars without application deployments.
What services gain: mTLS, distributed tracing, circuit breaking, load balancing — all without a single line of application code.
The cost: Sidecar adds latency (~5ms per hop), memory (~50MB per pod), and operational complexity. Worth it at scale; may not be worth it for 3 services.
Problem: A slow downstream dependency consumes all your threads or connections, starving other downstream calls.
Solution: Isolate each dependency into its own resource pool (thread pool or connection pool). A slow dependency only affects its own pool.
Without bulkhead:
All 200 threads shared → SlowService consumes all 200 → FastService gets none → everything fails
With bulkhead:
50 threads for SlowService → 150 threads for FastService
SlowService degrades → FastService unaffected
In Java/Spring: Resilience4j @Bulkhead — configure semaphore or thread pool bulkhead per downstream service. Hystrix (deprecated) called these “thread pools.”
Combined with circuit breaker: Bulkhead limits concurrent calls; circuit breaker stops calls when failure rate is high. Used together, they prevent a failing dependency from cascading.