System Design Basics

Microservices Patterns: Saga, CQRS, Event Sourcing, BFF, and More

7 April 2026·8 mins

Microservices patterns are the vocabulary of distributed systems design. Knowing when to apply each one — and when not to — separates an architect who reads pattern books from one who’s shipped production systems.

Engineering Leadership Trade-offs: Build vs Buy, Tech Debt, and Rewrite vs Refactor

7 April 2026·8 mins

EM interviews often end with “the harder framing” — questions about judgment, decision-making under pressure, and how you navigate disagreement. These don’t have right answers; they have reasoned answers that demonstrate how you think. Here’s a framework for the most common ones.

Data Pipeline and Analytics: OLTP vs OLAP, Batch vs Streaming, CDC

7 April 2026·6 mins

As systems grow, the gap between operational data (what your application uses to run) and analytical data (what your business uses to make decisions) becomes significant. Understanding how to design data pipelines that bridge this gap is an EM-level concern.

Testing Strategy: Test Pyramid, Contract Testing, and Coverage Pragmatics

7 April 2026·7 mins

Testing strategy is an EM-level concern because it directly affects delivery velocity, production reliability, and onboarding speed. Too little testing = production incidents. Too much ceremony = slow CI and frustrated engineers. The goal is the right tests in the right places.

Build, Deploy, and Release: Trunk-Based Dev, Deployment Strategies, Zero-Downtime DB Migrations

7 April 2026·7 mins

How you deploy code is as important as how you write it. The gap between writing a feature and it running in production reliably is where most engineering organizations lose velocity. This post covers the decisions that shape that gap.

Cloud and Infrastructure: AWS vs GCP vs Azure, Kubernetes vs Serverless

7 April 2026·6 mins

Cloud infrastructure decisions are often more political than technical. The right answer depends on where your team’s expertise is, what your customers require, and what you’re willing to operate. Here’s how to frame these decisions at the EM level.

Security and Authentication: JWT, OAuth2, and Secrets Management

7 April 2026·7 mins

Security architecture decisions have higher stakes than most — the cost of getting them wrong is a data breach, not a performance degradation. This post covers the trade-offs that come up in EM-level interviews: authentication approaches, identity protocols, and secrets management.

Observability: Logs, Metrics, Traces, and Alerting

7 April 2026·7 mins

Observability is the ability to understand what’s happening inside your system from the outside — from its outputs. The three pillars (logs, metrics, traces) are complementary tools, each answering different questions. Getting the combination right is what separates systems that you can reason about from systems that require tribal knowledge to debug.

Reliability and Resilience: Circuit Breakers, Retries, SLOs, and Failure Modes

7 April 2026·7 mins

Reliability isn’t about preventing failures — it’s about building systems that fail gracefully, recover quickly, and maintain user trust even when things go wrong. This post covers the patterns that keep systems running under degraded conditions.

Scaling Strategies: A Decision Framework

7 April 2026·6 mins

Scaling is not a synonym for “add more servers.” Each scaling lever has different costs, trade-offs, and appropriate circumstances. Reaching for the wrong one wastes money, adds complexity, or misses the actual bottleneck.

Consistency, Availability, and the CAP/PACELC Trade-off

7 April 2026·6 mins

Consistency and availability trade-offs show up in nearly every system design discussion. The theory (CAP, PACELC) is well-known; the practical application — knowing which choice to make for a specific use case — is what separates a design-literate engineer from one who just quotes theorems.

Microservices vs Monolith: Making the Right Architecture Call

7 April 2026·6 mins

The microservices vs monolith debate is one of the most over-indexed topics in software architecture — teams decompose too early, pay operational costs they’re not ready for, and spend months untangling the mess. The decision framework is simpler than the discourse suggests.

API Design: REST vs GraphQL vs gRPC

7 April 2026·6 mins

API design decisions have long tails — once you publish an API and clients integrate with it, changing it is expensive. The choice of protocol, versioning strategy, and backwards compatibility approach should be deliberate, not defaults.

Messaging and Event-Driven Architecture: Kafka vs RabbitMQ vs SQS

7 April 2026·6 mins

The choice between a message queue and an event streaming platform shapes your architecture more than almost any other infrastructure decision. Getting it wrong means rebuilding — not reconfiguring. Here’s how to think through it.

Caching Strategies: Placement, Patterns, and Pitfalls

7 April 2026·7 mins

Caching is the single highest-leverage performance tool available — and also one of the most common sources of production bugs. The decision isn’t just “should we cache?” — it’s where, how, and what the consistency implications are.

NoSQL Families: Choosing the Right Tool

7 April 2026·6 mins

NoSQL isn’t a single thing — it’s five different database families with fundamentally different data models, consistency guarantees, and use cases. Using the wrong family (or the wrong database within a family) is a common and costly mistake. Here’s how to think through each one.

SQL Flavors: Postgres vs MySQL vs SQL Server

7 April 2026·6 mins

SQL is SQL until it isn’t. When you’re making a database selection for a new service, the choice between PostgreSQL, MySQL, and SQL Server comes down to features, ecosystem, operational model, and political reality. Here’s how to reason through it.

SQL vs NoSQL: Making the Right Call

7 April 2026·6 mins

“Should we use SQL or NoSQL?” is one of the most common — and most misunderstood — architecture questions. Teams default to NoSQL because it sounds modern or scalable, or to SQL because it’s familiar. Neither is the right reason. The decision should come from your data’s shape, consistency requirements, and access patterns.