SQL vs NoSQL: Making the Right Call
“Should we use SQL or NoSQL?” is one of the most common — and most misunderstood — architecture questions. Teams default to NoSQL because it sounds modern or scalable, or to SQL because it’s familiar. Neither is the right reason. The decision should come from your data’s shape, consistency requirements, and access patterns.
Use a relational database when:
1. Your data has relationships you’ll query across. If you regularly join orders to users to products to promotions, a relational model with foreign keys and proper indexes is cleaner and faster than assembling that from multiple document fetches or denormalized data.
2. You need ACID transactions that span multiple entities. Transferring money between accounts, reserving inventory while recording an order, updating multiple tables atomically — these are relational databases’ core strength. Multi-document transactions in MongoDB exist but carry overhead and aren’t always supported across all deployment topologies.
3. Your schema is relatively stable and well-understood. The discipline of a schema is a feature, not a limitation. It catches bugs at write time instead of read time, enforces invariants, and makes the data self-documenting.
4. You have complex reporting or ad-hoc queries. SQL is a powerful, flexible query language. Window functions, CTEs, aggregations, complex joins — doing this against a document store is painful.
5. You value operational maturity. PostgreSQL, MySQL, SQL Server have decades of tooling, DBA expertise, migration tools, monitoring integrations, and community knowledge. You’ll find an answer to almost any production problem in a Stack Overflow thread.
Use a document database (MongoDB, DynamoDB, Firestore) when:
1. Your data naturally maps to a document. A product catalog where each product has different attributes (a t-shirt has size/color, a TV has resolution/refresh-rate) is awkward in a relational schema (nullable columns or EAV tables). In a document store, each product is just a document with whatever fields it needs.
2. You need schema flexibility during rapid iteration. In early product development, the schema changes every sprint. With a document store, adding a new field doesn’t require a migration — it just exists on new documents. (Warning: this also means you accumulate technical debt in the form of old documents missing new fields. Eventually you pay this debt in application code.)
3. Your read access pattern is almost always “get everything for one entity.” If you’re almost always fetching “the entire user profile” or “the entire order with all line items,” denormalizing into a document is faster than joining five tables.
4. You need horizontal write scale from day one. Document stores typically shard more naturally than relational databases. If your write volume is extreme and you know it from the start, a document store may be the right choice. (That said, Postgres can handle a lot more write throughput than most teams think before sharding becomes necessary.)
When a team says “we want MongoDB because it’s faster,” the EM question is: faster for what?
MongoDB can be faster for simple key-lookup reads of a single document — no join cost. But PostgreSQL with proper indexing on a single-row fetch is comparably fast. “MongoDB is faster” often reflects an experience where someone ran Postgres without indexes, or did a query that would benefit from denormalization, and then compared it to a MongoDB query on a pre-denormalized document. The comparison was unfair.
What to probe:
- What queries are you optimizing?
- Have you profiled the SQL queries and confirmed they’re the bottleneck?
- Is the schema designed to support the access patterns (or is it a normalized academic schema never optimized for production)?
Often the right answer is to optimize the relational queries first. NoSQL introduces significant operational complexity (eventual consistency, no joins, limited transactions) that shouldn’t be accepted without a real need.
Using different databases for different parts of your system is valid — but it’s a complexity budget decision.
Legitimate use cases:
- Core transactional data in Postgres; full-text search in Elasticsearch; session/cache in Redis. These are genuinely different access patterns that different stores are optimized for.
- Product catalog in MongoDB; order management in Postgres. Product data is highly variable-schema; order data is structured and transactional.
Warning signs:
- Using polyglot persistence without clear ownership boundaries. If two services share a database, you’ve created coupling. If one service spans two databases with joins between them, you’ve created a nightmare.
- Choosing a NoSQL store for “future flexibility” without a concrete use case. The operational overhead of running and maintaining multiple database systems is real — you need separate backups, monitoring, expertise, and runbooks.
“How do you handle transactions when each service owns its own database?” is a standard EM-level question. The answer:
You don’t get distributed ACID — you use patterns that achieve eventual consistency:
- Saga pattern: Break a distributed transaction into a sequence of local transactions with compensating transactions for rollback. Choreography-based (events trigger next steps) or orchestration-based (a coordinator directs the sequence).
- Outbox pattern: Write the event to a local table in the same database transaction as the business data. A separate process reads and publishes it. Guarantees at-least-once event delivery without two-phase commit.
- Two-phase commit (2PC): Theoretically possible but rarely used in microservices — it requires a coordinator, is slow, and failure modes are complex. Avoid unless you have no alternative.
The key insight: distributed transactions are usually the wrong framing. Better to ask “can I redesign the service boundaries so one service owns this entire operation?” Service boundary design should minimize cross-service coordination.
Sharding adds massive operational complexity. Before reaching for it:
- Optimize queries first. Missing indexes, inefficient queries, full table scans — fix these first.
- Read replicas. Most applications are read-heavy. Adding read replicas handles 80% of scaling needs with minimal risk.
- Connection pooling. Postgres with PgBouncer handles 10,000+ connections on hardware that would crumble under naive direct connections.
- Caching. A well-placed cache eliminates a class of database reads entirely.
- Vertical scaling. Modern cloud instances offer 192 cores and 24TB of RAM. Vertical scaling is unfairly dismissed — it’s often the right answer for another 2–3 years.
Signals that sharding might be necessary:
- Single-node write throughput is saturated (not read — that’s replicas)
- The dataset is too large for a single node’s storage (though with SSDs and cloud volumes this is rarer than it sounds)
- You have strict data residency requirements (sharding by region)
Even then, evaluate whether a managed distributed database (Aurora, CockroachDB, PlanetScale) abstracts the sharding complexity before building a custom sharding layer.