Testing Strategy: Test Pyramid, Contract Testing, and Coverage Pragmatics

Apr 7, 2026 7 minutes to read

Testing strategy is an EM-level concern because it directly affects delivery velocity, production reliability, and onboarding speed. Too little testing = production incidents. Too much ceremony = slow CI and frustrated engineers. The goal is the right tests in the right places.

The Test Pyramid for Microservices

The classic test pyramid has unit tests at the base, integration tests in the middle, and end-to-end tests at the top. In microservices, the pyramid shifts slightly because the “integration” layer is where most of the real risk lives.

         /\
        /E2E\        ← Few, slow, high confidence for critical paths
       /------\
      / Service \    ← Medium — test one service with real dependencies
     / Integration\
    /--------------\
   /  Unit Tests    \ ← Many, fast, test logic in isolation
  /------------------\

Unit Tests (Base)

Test a single class or function in isolation. Fast (milliseconds), no I/O, no database, no HTTP.

What belongs in unit tests:

Pure business logic — validation rules, calculations, transformations
Complex conditional logic — branch coverage
Edge cases and error paths
Utility functions

What doesn’t belong in unit tests:

“Does this Spring bean wire correctly?” — that’s not a unit test, it’s integration
“Does this SQL query return the right rows?” — needs a real database
Anything that requires mocking more than 2 collaborators — usually a design smell

Mocking: Use sparingly. Heavy mocking creates tests that are coupled to implementation rather than behavior. If you’re mocking 5 dependencies to test a single method, the method probably does too much.

Integration Tests (Middle)

Test a component with its real dependencies. In microservices, this typically means testing one service with a real database, real cache, but mocked or stubbed external services.

Tools:

Testcontainers: Spin up real Postgres, Redis, Kafka in Docker for tests. Tests run against real infrastructure, same version as production. Eliminates the “it works locally but not in prod” class of bugs.
Spring Boot Slice Tests: @DataJpaTest spins up only JPA components + in-memory DB. @WebMvcTest tests controllers without the full context. Faster than @SpringBootTest.
@SpringBootTest with Testcontainers: Full integration test — the whole application + real DB/cache.

When integration tests are worth more than unit tests:

Repository/DAO layer — the actual SQL query behavior is what matters, not the Java code
Database migrations — does the migration run without errors? Does the ORM still work after it?
Configuration — does the Spring context load correctly with the production config?
Request/response mapping — does the HTTP layer serialize/deserialize correctly?

End-to-End Tests (Top)

Test complete user workflows across multiple services. Simulate a real user: create order → process payment → send confirmation.

The cost: Slow (minutes), flaky (dependent on all services being up), expensive to maintain (any service change may break unrelated E2E tests).

When to use: Critical user journeys only. Checkout flow. Login/auth. Core CRUD for your primary entity. Not for every feature.

Alternative: Component tests — test one service from its HTTP boundary with all dependencies (Testcontainers), treating it as a black box. This gives high confidence without cross-service fragility.

Integration vs Unit Tests: When Integration Tests Win

The temptation to mock everything results in a large unit test suite that passes while production is on fire. Integration tests catch the issues unit tests miss:

ORM mapping issues — your Java entity doesn’t match the DB schema
SQL query correctness — the query you wrote doesn’t return what you think
Transaction boundaries — two operations that should be atomic aren’t
Serialization/deserialization — JSON fields don’t map correctly
Database migration behavior — migration runs in prod but your unit tests use an in-memory H2 DB
Connection pool exhaustion — tests that don’t clean up connections cause mysterious failures

Rule of thumb: Anything that talks to a database should have an integration test, not a unit test with a mocked repository. The repository mock tests that you called save() — the integration test tests that the data was actually saved correctly.

Contract Testing with Pact

In a microservices system, service A calls service B’s API. When service B’s team changes the API, service A breaks. How do you catch this before it reaches production?

Consumer-driven contract testing (Pact):

Consumer writes a contract. Service A defines what it uses from service B’s API: the endpoint, request format, response fields it cares about.
Contract is published to a Pact Broker (or Pactflow).
Provider (service B) validates the contract. Service B’s CI runs the consumer’s contract against the real service. If the contract passes, service B can deploy safely. If it breaks, CI fails.

Service A team writes:
  "I call GET /orders/{id} and expect { id, status, total }"

Service B CI runs:
  "Does GET /orders/{id} still return { id, status, total }? YES → ok to deploy"
  "Did we rename 'total' to 'amount'? NO → contract broken → CI fails"

When Pact is worth introducing:

Multiple teams where the consumer and provider teams are different
APIs change frequently and cross-team coordination is a bottleneck
You can’t easily run all services together for integration tests

When Pact is overkill:

Small team where you own all services — coordinate the change directly
The API is very stable — overhead of maintaining contracts exceeds the bug-catching value
You already have reliable E2E tests covering the integrations

The EM conversation: “Pact is valuable when ‘did I break someone?’ is a real question. If the answer is always ‘ask the team in Slack,’ Pact adds process that manual coordination can handle. At scale, it replaces manual coordination.”

Coverage: How Much Is Enough?

The honest answer: 100% code coverage mandates are often counterproductive. Coverage measures lines executed, not behavior validated. You can have 100% coverage with tests that assert nothing meaningful.

What coverage does tell you:

Areas of the codebase with zero tests — genuine risk
Paths that are never executed in tests — good candidates for review

What coverage doesn’t tell you:

Whether the tests are testing the right behavior
Whether the tested behavior is correct
Whether edge cases are handled

The pragmatic threshold:

New code should have tests for its intended behavior and error paths
A coverage drop on a PR should trigger a review, not a hard failure
Business-critical paths (checkout, payments, auth) should have higher coverage than admin utilities
Legacy code: don’t mandate coverage; add tests when you touch a file (Boy Scout Rule)

Pushing back on “100% coverage” mandates: “Coverage is a proxy metric, not a goal. We should be asking ‘are the critical behaviors tested?’ not ‘is every line executed?’ I’d rather have 70% coverage with tests that actually validate correctness than 100% coverage with tests that check implementation details.”

Testing Distributed Systems

Testing a distributed system is qualitatively harder than testing a monolith. The failure modes you need to test don’t show up in unit tests: network partitions, timeouts, duplicate messages, out-of-order events.

Testcontainers for realistic integration: Real Kafka, real Postgres, real Redis. Tests reflect what actually runs in production, not in-memory mocks that behave differently.

Chaos testing: Randomly inject failures in a controlled environment — kill a pod, add latency, drop network packets. Chaos Monkey, Chaos Mesh, AWS Fault Injection Simulator. The goal: discover failure modes before users do. Run in pre-prod, not in prod (until you’re mature).

Contract tests for service boundaries: Pact for API contracts. Reduces E2E test dependency.

Consumer-side stub servers: Wiremock or MockServer — run a stub that returns pre-recorded responses from the real service. Useful for testing a consumer in isolation without the real service.

The hardest thing to test: “What happens when message X arrives twice?” “What happens when the DB is down for 30 seconds mid-operation?” These scenarios require intentional fault injection in tests.

The EM stance on test investment: The most valuable tests are the ones that catch bugs before production and run fast enough to not be skipped. A 30-minute CI pipeline that flaps 20% of the time is worse than a 5-minute pipeline with 80% coverage that everyone trusts. Invest in test stability and speed before coverage percentage.