Microservices patterns are the vocabulary of distributed systems design. Knowing when to apply each one — and when not to — separates an architect who reads pattern books from one who’s shipped production systems.
EM interviews often end with “the harder framing” — questions about judgment, decision-making under pressure, and how you navigate disagreement. These don’t have right answers; they have reasoned answers that demonstrate how you think. Here’s a framework for the most common ones.
As systems grow, the gap between operational data (what your application uses to run) and analytical data (what your business uses to make decisions) becomes significant. Understanding how to design data pipelines that bridge this gap is an EM-level concern.
Testing strategy is an EM-level concern because it directly affects delivery velocity, production reliability, and onboarding speed. Too little testing = production incidents. Too much ceremony = slow CI and frustrated engineers. The goal is the right tests in the right places.
How you deploy code is as important as how you write it. The gap between writing a feature and it running in production reliably is where most engineering organizations lose velocity. This post covers the decisions that shape that gap.
Cloud infrastructure decisions are often more political than technical. The right answer depends on where your team’s expertise is, what your customers require, and what you’re willing to operate. Here’s how to frame these decisions at the EM level.
Security architecture decisions have higher stakes than most — the cost of getting them wrong is a data breach, not a performance degradation. This post covers the trade-offs that come up in EM-level interviews: authentication approaches, identity protocols, and secrets management.
Observability is the ability to understand what’s happening inside your system from the outside — from its outputs. The three pillars (logs, metrics, traces) are complementary tools, each answering different questions. Getting the combination right is what separates systems that you can reason about from systems that require tribal knowledge to debug.
Reliability isn’t about preventing failures — it’s about building systems that fail gracefully, recover quickly, and maintain user trust even when things go wrong. This post covers the patterns that keep systems running under degraded conditions.
Scaling is not a synonym for “add more servers.” Each scaling lever has different costs, trade-offs, and appropriate circumstances. Reaching for the wrong one wastes money, adds complexity, or misses the actual bottleneck.
Consistency and availability trade-offs show up in nearly every system design discussion. The theory (CAP, PACELC) is well-known; the practical application — knowing which choice to make for a specific use case — is what separates a design-literate engineer from one who just quotes theorems.
The microservices vs monolith debate is one of the most over-indexed topics in software architecture — teams decompose too early, pay operational costs they’re not ready for, and spend months untangling the mess. The decision framework is simpler than the discourse suggests.
API design decisions have long tails — once you publish an API and clients integrate with it, changing it is expensive. The choice of protocol, versioning strategy, and backwards compatibility approach should be deliberate, not defaults.
The choice between a message queue and an event streaming platform shapes your architecture more than almost any other infrastructure decision. Getting it wrong means rebuilding — not reconfiguring. Here’s how to think through it.
Caching is the single highest-leverage performance tool available — and also one of the most common sources of production bugs. The decision isn’t just “should we cache?” — it’s where, how, and what the consistency implications are.
NoSQL isn’t a single thing — it’s five different database families with fundamentally different data models, consistency guarantees, and use cases. Using the wrong family (or the wrong database within a family) is a common and costly mistake. Here’s how to think through each one.
SQL is SQL until it isn’t. When you’re making a database selection for a new service, the choice between PostgreSQL, MySQL, and SQL Server comes down to features, ecosystem, operational model, and political reality. Here’s how to reason through it.
“Should we use SQL or NoSQL?” is one of the most common — and most misunderstood — architecture questions. Teams default to NoSQL because it sounds modern or scalable, or to SQL because it’s familiar. Neither is the right reason. The decision should come from your data’s shape, consistency requirements, and access patterns.