API Design: REST vs GraphQL vs gRPC
API design decisions have long tails — once you publish an API and clients integrate with it, changing it is expensive. The choice of protocol, versioning strategy, and backwards compatibility approach should be deliberate, not defaults.
REST is HTTP-native — it uses standard verbs (GET, POST, PUT, PATCH, DELETE), status codes, headers, and content negotiation. It’s stateless, cacheable, and every HTTP client in existence can call it.
REST wins when:
- Your consumers are diverse (mobile apps, third-party developers, browsers, other services)
- You need HTTP caching (GET responses with
Cache-Control) - The access patterns map naturally to resources and CRUD
- Team familiarity matters — REST is the most widely understood API style
- You need public or partner APIs where simplicity and documentation matter
REST’s weaknesses:
- Over-fetching: API returns a
Userobject with 30 fields; client needed 3. Wastes bandwidth and parsing time, especially on mobile. - Under-fetching: Client needs user + orders + profile. Three round trips unless you build a custom endpoint.
- Versioning drift: Over time, APIs accumulate versions and deprecated fields, and the surface area becomes unwieldy.
For most internal and external APIs, these weaknesses are manageable with thoughtful design (field selection, composite endpoints for common patterns) and don’t justify the complexity of an alternative.
GraphQL is a query language — clients specify exactly what data they need in the shape they need it.
query {
user(id: "123") {
name
email
orders(last: 5) {
id
status
total
}
}
}
GraphQL wins when:
- Multiple clients with different data needs. Mobile app needs fewer fields; web app needs more. With REST, you build multiple endpoints or bloat the response. With GraphQL, each client requests exactly what it needs.
- BFF (Backend for Frontend) aggregation. A single GraphQL layer aggregates data from multiple backend services. The client doesn’t need to know about backend service topology.
- Rapidly evolving data model. Adding new fields doesn’t break existing queries. Deprecating fields is visible in the schema.
- Complex, nested data relationships. GraphQL resolvers compose naturally for graph-shaped data.
GraphQL’s real costs:
- Caching is harder. REST GET requests are trivially cacheable by URL. GraphQL queries are POST requests with a body — HTTP caching doesn’t apply by default. You need application-level caching (persisted queries, DataLoader for N+1 batching).
- N+1 queries are easy to introduce. A naive GraphQL resolver fetches each item’s related data in a loop. DataLoader batches these, but it must be implemented correctly.
- Error handling is non-standard. GraphQL returns HTTP 200 even when the query partially fails (errors in the
errorsarray). This breaks conventional monitoring that keys on HTTP status codes. - Security surface: Clients can write arbitrarily complex queries. Depth limiting, query complexity budgets, and persisted queries are necessary to prevent abuse.
- Tooling and expertise: The ecosystem is good but smaller than REST. Debugging, federation (Apollo Federation), schema stitching — all add complexity.
The honest EM take: GraphQL is genuinely valuable for consumer-facing APIs where multiple clients (iOS, Android, web) have divergent data needs, or for a BFF aggregation layer. For internal service-to-service communication, it’s rarely the right choice — gRPC or REST is simpler.
gRPC uses Protocol Buffers (binary serialization) over HTTP/2. It’s contract-first — the .proto file defines the API, and code is generated for both client and server.
service UserService {
rpc GetUser (UserRequest) returns (UserResponse);
rpc StreamUserEvents (UserRequest) returns (stream UserEvent);
}
gRPC wins when:
- Internal service-to-service communication where performance matters
- Strongly typed contracts between services reduce integration bugs
- You want auto-generated client libraries in multiple languages
- You need streaming (server streaming, client streaming, bidirectional streaming)
- Polyglot microservices — generated clients work in Go, Java, Python, etc.
gRPC’s costs:
- Not browser-native — gRPC-Web proxy needed for browser clients (adds complexity)
- Binary protocol means you can’t curl it without tooling (grpcurl, Postman with gRPC support)
- HTTP/2 can be problematic through certain proxies, load balancers, and firewalls
- Protobuf schema evolution requires discipline (don’t reuse field numbers)
- Steeper learning curve than REST for teams new to it
REST vs gRPC for internal services:
- Small team, REST expertise, simple request/response: REST is fine
- Performance-critical inter-service calls, polyglot environment, strict typing: gRPC
- The performance difference (binary vs JSON, HTTP/2 multiplexing) is real but usually not the bottleneck — don’t over-optimize
Versioning is a commitment to support multiple API behaviors simultaneously. Choose your strategy upfront because changing it later is painful.
- Explicit, discoverable
- Easy to route at API gateway
- Clients know exactly what version they’re using
- Version proliferation:
/v1,/v2,/v3requires parallel maintenance
- Clean URLs
- Harder to test (can’t just change the URL)
- Less discoverable
- Often used for content negotiation-style versioning
- Only add fields, never remove them
- Use
@deprecatedannotation in schemas and documentation - Set a sunset date and enforce client migration
- Requires disciplined schema evolution (additive-only changes)
- Works well for mature APIs with trusted consumers
Recommendation: URL versioning for public APIs (clarity over elegance). No versioning with additive-only evolution for internal APIs with internal consumers where you can coordinate migrations.
When changing an API used by many clients, the risks are:
- Removing a field a client depends on
- Changing a field’s type
- Changing behavior of an existing operation
Safe changes (backwards compatible):
- Adding optional fields to requests
- Adding fields to responses (clients must ignore unknown fields — enforce this)
- Adding new endpoints
- Adding new enum values (with care — some clients break on unknown enums)
Breaking changes:
- Removing or renaming fields
- Changing field types
- Changing error codes or response structure
- Changing required/optional semantics
Consumer-driven contract testing (Pact): Publish a contract describing what each consumer uses. CI checks that new API versions don’t violate any published contracts. This is the most rigorous approach for a large consumer base.
Sunset headers: Deprecation: true, Sunset: Sat, 01 Jan 2027 00:00:00 GMT. Programmatic signal to clients to migrate. Monitor usage of deprecated endpoints before removal.
Polling: Client calls /status?id=123 every N seconds. Simple, stateless, easy to scale. Every client is bombarded with unnecessary requests. Acceptable for low-frequency status checks (job status, slow-changing data).
Long Polling: Client makes a request; server holds it open until there’s data to send (or timeout). Reduces unnecessary requests but complicates server-side connection management. Largely superseded by SSE and WebSockets.
Server-Sent Events (SSE): HTTP-based unidirectional push from server to client. Standard EventSource API in browsers. Automatic reconnection. Works through most proxies. Good for: live dashboards, news feeds, notification pushes, progress updates.
WebSockets: Full-duplex, bidirectional. Client and server both push and receive. More complex to scale (stateful connections, sticky sessions or pub/sub fan-out layer). Good for: chat applications, real-time collaborative editing, live gaming, trading platforms.
The decision:
- One-way server-to-client push, browser client: SSE
- Bidirectional real-time communication: WebSocket
- Infrequent updates, simple implementation: polling
- Never use WebSockets just because “it’s faster” for standard request/response — the overhead of connection management outweighs the benefit