nSkillHub
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Microservices vs Monolith: Making the Right Architecture Call

The microservices vs monolith debate is one of the most over-indexed topics in software architecture — teams decompose too early, pay operational costs they’re not ready for, and spend months untangling the mess. The decision framework is simpler than the discourse suggests.


Start With the Questions, Not the Conclusion

When a team says “we want to break our monolith into microservices,” the right response isn’t to approve or reject — it’s to ask:

1. What problem are you trying to solve?

  • Deployment independence? (“The payments team is blocked waiting for the user team to release”)
  • Scale independence? (“Search needs to scale to 100x but billing doesn’t”)
  • Team autonomy? (“12 teams working in one codebase is causing constant conflicts”)
  • Technology heterogeneity? (“We need to use Python for ML but Java for the API”)
  • Reliability isolation? (“A bug in the recommendation engine shouldn’t take down checkout”)

If you can’t answer this specifically, the motivation is likely “microservices are modern” — which is not a reason.

2. What’s the team’s operational maturity? Microservices require: distributed tracing, per-service monitoring, independent CI/CD pipelines, service discovery, centralized logging, network policies, and on-call runbooks for N services instead of 1. Most teams underestimate this by 10x.

3. What’s the team size? Conway’s Law is real: your system architecture mirrors your communication structure. The rough heuristic: one service per team (or per two-pizza team). If you have 5 engineers, you don’t need 15 services.


The Modular Monolith: The Middle Ground You’re Not Considering

Before jumping to microservices, ask: “Have we tried making our monolith modular first?”

A modular monolith has:

  • Clear module boundaries enforced by the package structure or module system
  • Well-defined interfaces between modules (no direct cross-module field access)
  • Independent test suites per module
  • The ability to extract a module into a service later if needed

The modular monolith gives you most of the domain separation benefits without the operational overhead. It’s dramatically underrated.

When does the modular monolith break down?

  • Different scaling requirements that can’t be addressed by horizontal scaling the whole app
  • True deployment independence is needed (different teams, different release cycles)
  • Different reliability requirements (one component fails constantly, don’t want it taking down everything)
  • Genuine technology heterogeneity needs

The Right Size for a Microservice

“What’s the right size?” is the wrong framing. The right framing: what are the right boundaries?

Good service boundaries:

  • Align with a bounded context (DDD) — the service owns a coherent domain concept and its data
  • Own their data — no shared database between services
  • Have minimal coordination requirements — calling another service for every operation signals a misaligned boundary
  • Have independent deployability — can be deployed without coordinating with other services

The seam question: “If I change this service, do I always have to change that other service at the same time?” If yes, they’re too coupled and should probably be one service.

Signs your service is too small (nanoservices):

  • Every business operation requires calling 5+ services
  • Most services are essentially pass-throughs with no logic
  • Network hops dominate your latency
  • A “simple” feature requires deploying 4 services

Signs your service is too large:

  • Multiple teams are working on the same service and blocking each other
  • The service has clear internal sub-domains that have different scaling or reliability requirements
  • Deployments take hours and are risky because the blast radius is huge

Shared Code Across Services: The Coupling Trap

When multiple services share a library, that library becomes a coordination point. The failure mode:

  • common-lib contains the User model, Order model, validation logic
  • Service A updates common-lib to add a field to User
  • Service B, C, D, E all must update their common-lib dependency or the build breaks
  • You’ve recreated the monolith as a distributed dependency graph

What to share vs what not to:

  • Share: Logging libraries, telemetry instrumentation, security token parsing utilities, internal HTTP client wrappers. These are infrastructure concerns, not domain concerns.
  • Don’t share: Domain models, business validation logic, data transfer objects that represent domain concepts. Each service should own its domain types.
  • Prefer duplication over wrong abstraction. Two services having their own User class with slightly different fields is usually better than a shared class that satisfies neither cleanly.

The Operational Costs People Underestimate

Microservices don’t reduce complexity — they trade one kind of complexity for another.

What you gain: Deployment independence, scale independence, team autonomy, technology heterogeneity, fault isolation.

What you pay:

  • Distributed system problems. Every service call can fail, timeout, return stale data, or experience network partition. You need timeouts, circuit breakers, retries, and idempotency everywhere.
  • Observability complexity. A single request now touches 5 services. Without distributed tracing (Jaeger, Zipkin, Tempo), debugging is nearly impossible.
  • Testing complexity. Integration testing a distributed system requires either mocks (fragile) or a real environment (expensive). Contract testing helps but adds process overhead.
  • Data consistency. No cross-service transactions. Saga patterns, eventual consistency, and compensation logic must be designed and tested.
  • Operational overhead. N services means N deployment pipelines, N monitoring dashboards, N on-call runbooks, N certificate renewals, N infrastructure configs.

The rule of thumb: Each new service needs someone to own it. If no one has the bandwidth to own it — to monitor it, to be on-call for it, to maintain its runbook — don’t extract it yet.


Distributed Transactions: Saga, Outbox, and 2PC

When an operation spans multiple services, you can’t use a database transaction. The patterns:

Saga Pattern

Break the distributed operation into a sequence of local transactions. If a step fails, execute compensating transactions to undo previous steps.

Choreography-based Saga: Each service publishes events and listens for events from other services. Loosely coupled, but the overall business flow is implicit — hard to see, hard to debug.

OrderService: ORDER_CREATED event →
  InventoryService: INVENTORY_RESERVED event →
    PaymentService: PAYMENT_CHARGED event →
      OrderService: ORDER_CONFIRMED
Failure: PaymentService publishes PAYMENT_FAILED →
  InventoryService: INVENTORY_RELEASED

Orchestration-based Saga: A central orchestrator (or saga coordinator) explicitly tells each service what to do and handles failures.

SagaOrchestrator:
  1. Call InventoryService.reserve() → success
  2. Call PaymentService.charge()   → fails
  3. Call InventoryService.release() (compensate)
  4. Return failure to caller

Orchestration is more visible and debuggable; choreography is more decoupled. For complex multi-step sagas, orchestration is often more maintainable.

Outbox Pattern

Guarantees that a database write and a message publication are atomic, without two-phase commit.

BEGIN TRANSACTION
  INSERT INTO orders(id, ...) VALUES (...)
  INSERT INTO outbox(event_type, payload) VALUES ('ORDER_CREATED', {...})
COMMIT

-- Separate process:
  SELECT * FROM outbox WHERE published = false
  Publish to Kafka
  UPDATE outbox SET published = true

The outbox and the business data are in the same database, so they’re committed atomically. The publisher reads from the outbox and delivers to the message broker. At-least-once delivery — consumers must be idempotent.

2PC (Two-Phase Commit)

Theoretically guarantees atomic commit across multiple systems. In practice: the coordinator becomes a single point of failure, blocking locks are held during the prepare phase, and failure scenarios are complex and hard to test. Almost never the right answer in microservices.

The EM stance: Design service boundaries to minimize distributed transactions. If you’re writing a saga for every operation, your service boundaries are wrong.