Uber / Ride-Sharing System

Table of Contents

1. Hook
#

Every time someone taps “Request Ride” on Uber, the platform must answer a deceptively hard spatial query in under a second: which of the thousands of nearby drivers is the best match for this rider, given their location, heading, vehicle type, and current workload? Uber processes 25 million trips per day across 70+ countries, with peak demand spikes during commute hours, concerts, and bad weather — all of which arrive simultaneously in the same city blocks.

The core challenge is a moving-object matching problem at planetary scale: drivers broadcast location updates every 4 seconds, riders issue surge requests from the same areas at the same instant, and the matching decision must be made before the rider opens a second app. Get the latency wrong and conversion drops; get the matching algorithm wrong and driver utilisation collapses, surge prices spike, and riders churn. Every major architectural decision in this system flows from that single constraint.

2. Problem Statement
#

Functional Requirements
#

Riders can request a ride and be matched to a nearby available driver.
Drivers broadcast their real-time location every 4 seconds while the app is open.
The system shows riders estimated arrival time (ETA (Estimated Time of Arrival)) and upfront price.
Surge pricing adjusts fares dynamically based on local supply/demand ratio.
Riders and drivers can track each other’s live location during the trip.
Riders can cancel pre-pickup; drivers can cancel or go offline.
Completed trips generate a receipt with fare breakdown and route map.

Non-Functional Requirements
#

Requirement	Target
Ride-match latency P99	< 1 s from request to driver offer sent
Driver location update ingestion	500 K updates/sec peak globally
Driver search radius	Returns nearby drivers within 500 ms
Availability	99.99% (< 53 min downtime/year)
ETA accuracy	≤ 2 min error P90
Surge computation lag	< 30 s to reflect demand change
Scale	25 M trips/day, ~5 M concurrent active drivers peak

Out of Scope
#

Payment processing and fraud detection.
Driver background checks and onboarding.
Driver earnings, tips, and promotions.
UberEats food delivery (separate dispatch model).

3. Scale Estimation
#

Assumptions:

25 M trips/day → ~290 trips/sec average, ~870 trips/sec peak (3× average).
5 M drivers active during peak hours; each sends a location ping every 4 seconds.
Location update ingestion: 5 M / 4 s = 1.25 M writes/sec (global); assume 40% concentrated in top 10 cities = 500 K writes/sec in peak cluster.
Rider-side: 25 M trips, assume 3× open-app sessions per trip (rider opens app, cancels, retries) = 75 M sessions/day ≈ 870 req/sec average, 2,600 req/sec peak.
Average trip duration: 15 min → 25 M trips × 15 min × 2 parties = 750 M location-stream minutes/day ≈ 520 K concurrent location streams during peak.

Metric	Daily	Peak/sec
Driver location writes	108 B	1.25 M
Rider match requests	75 M	2,600
Active location streams (trip in progress)	—	520 K
ETA queries (pre-match)	250 M	2,900
Storage (location log, 30-day TTL (Time-To-Live))	108 B × 50 B/record = ~5.4 TB/day	—
Storage, 30-day retention	~162 TB	—

Location records are small (driver_id, lat, lng, heading, speed, timestamp = ~50 bytes). Routing graph for a city (~10 M edges) fits in ~2 GB RAM — one replica per region.

4. High-Level Design
#

The request lifecycle has four phases: location ingestion → driver search → matching → trip lifecycle.

flowchart TD
    subgraph Client
        A[Rider App]
        B[Driver App]
    end
    subgraph Edge
        C[API Gateway / Load Balancer]
    end
    subgraph CoreServices
        D[Location Service]
        E[Match Service]
        F[Trip Service]
        G[Surge Service]
        H[ETA Service]
        I[Notification Service]
    end
    subgraph Storage
        J[(Location Store\nRedis + Geo index)]
        K[(Trip DB\nCassandra)]
        L[(Routing Graph\nIn-memory per region)]
        M[Kafka\nLocation Stream]
    end

    B -- "GPS ping every 4 s" --> C
    A -- "Ride request" --> C
    C --> D
    C --> E
    D -- "write lat/lng" --> J
    D -- "publish" --> M
    M --> G
    E -- "geo query: drivers near rider" --> J
    E --> H
    H --> L
    E --> F
    F --> K
    F --> I
    I -- "push offer" --> B
    G -- "surge multiplier" --> E

Write path (driver location): Driver App → API Gateway → Location Service → writes to Redis Geo index (for real-time search) and publishes to Kafka (for surge computation and analytics).

Read path (rider match): Rider App → API Gateway → Match Service → queries Redis Geo index for drivers within X km → scores candidates → sends offer via Notification Service → Driver accepts/declines → Trip Service creates trip record.

Component	Role	Key Tech
Location Service	Ingests driver GPS pings, updates geo index	Redis GEOADD, Kafka producer
Match Service	Finds candidates, scores, dispatches offer	Redis GEORADIUS, scoring engine
Trip Service	Manages trip state machine, receipts	Cassandra, event sourcing
ETA Service	Computes route + time from driver to pickup	In-memory road graph, Dijkstra/A*
Surge Service	Computes supply/demand ratio per Geohash cell	Kafka Streams, sliding window
Notification Service	Pushes ride offers, status updates to apps	FCM (Firebase Cloud Messaging) / APNs (Apple Push Notification service), WebSocket

5. Deep Dive
#

5.1 Driver Location Indexing with Geohash
#

A Geohash encodes a latitude/longitude pair into a short alphanumeric string where shared prefix = geographic proximity. Precision 6 (dqcjqc) covers ~1.2 km × 0.6 km, precision 7 covers ~153 m × 153 m — appropriate for city-block-level grouping.

Uber’s Location Service maintains a Redis Sorted Set per Geohash cell (or uses Redis’s native GEO commands which are backed by a sorted set with a Geohash score). On every driver ping:

// Java 17 record for a driver location event
record DriverLocation(String driverId, double lat, double lng,
                      double heading, double speedKmh, Instant ts) {}

// Location Service — hot path (called 1.25M times/sec across cluster)
public void updateLocation(DriverLocation loc) {
    // Redis GEO index: O(log N) insert
    geoCommands.geoadd("drivers:available",
        new GeoValue<>(loc.driverId(),
                       new GeoCoordinates(loc.lng(), loc.lat())));

    // Publish to Kafka for surge and analytics (async, fire-and-forget)
    kafkaProducer.send(new ProducerRecord<>("driver-locations",
        loc.driverId(), serialize(loc)));
}

Redis GEORADIUS (or the newer GEOSEARCH) returns drivers within a given radius in O(N + log M) where N is results and M is total entries. At 5 M drivers globally, sharded across 20 Redis nodes by Geohash prefix, each shard holds ~250 K entries — GEOSEARCH on a 2 km radius returns ~50 candidates in < 2 ms.

Why Redis over PostGIS? PostGIS with a GiST (Generalized Search Tree) index is accurate but a relational write at 1.25 M/sec is painful. Redis keeps location data in RAM, trades durability (location data is ephemeral — a stale ping expires in 30 s), and achieves sub-millisecond latency on geo queries. PostGIS is used for analytics batch jobs, not the hot path.

5.2 The Matching Algorithm
#

After retrieving ~50 driver candidates within radius, the Match Service scores each one:

score = w1 × ETA_seconds⁻¹
      + w2 × driver_acceptance_rate
      + w3 × driver_rating
      - w4 × trip_count_last_hour   (fairness: avoid overloading one driver)

The top-scored available driver receives an offer. If they decline or don’t respond within 15 seconds, the offer goes to the second candidate. This is a sequential offer model (not broadcast) — broadcasting causes all drivers to accept simultaneously, creating a race condition with one winner and many disappointed drivers who just drove toward the pickup.

The offer is sent via WebSocket if the driver app is connected (preferred: < 100 ms RTT (Round-Trip Time)), falling back to FCM push notification (typically 200-800 ms).

5.3 Trip State Machine
#

Trips follow a strict state machine enforced by the Trip Service. Invalid transitions are rejected at the service layer, preventing race conditions from double-accepting or double-completing a trip.

flowchart TD
    A([REQUESTED]) --> B([DRIVER_ASSIGNED])
    B --> C([DRIVER_EN_ROUTE])
    C --> D([ARRIVED])
    D --> E([IN_PROGRESS])
    E --> F([COMPLETED])
    A --> G([CANCELLED_BY_RIDER])
    B --> G
    C --> G
    B --> H([CANCELLED_BY_DRIVER])
    C --> H

Each state transition is written to Cassandra as an immutable event (event sourcing pattern). The current state is derived from the latest event for a given trip_id. This gives a complete audit trail and makes receipt generation trivial (replay all events for the trip).

5.4 ETA Computation
#

ETA is computed using the road graph: nodes are intersections, edges are road segments with a weight of distance / speed_limit × congestion_factor. The graph is loaded into memory per region service instance (~2 GB for a large metro). Dijkstra’s algorithm with a binary-heap priority queue runs a shortest-path query in < 10 ms for intra-city distances.

For real-time congestion, Uber ingests anonymised speed data from all active trips (another stream from Kafka) and updates edge weights every 60 seconds. This is essentially a continuous graph update — weights are adjusted without rebuilding the full graph.

6. Data Model
#

Driver Location (Redis, TTL 30 s)
#

Field	Type	Notes
key	`drivers:available` (sorted set per shard)	Sharded by Geohash prefix
member	`driver_id`	String
score	Geohash integer (derived from lat/lng)	Used by Redis GEO commands
Auxiliary hash	`driver:{id}:meta`	heading, speed, vehicle_type, last_ping_ts

Drivers who haven’t pinged in 30 seconds are expired from the drivers:available set via a background sweeper (checks last_ping_ts TTL).

Trip (Cassandra)
#

Column	Type	Notes
trip_id	UUID	Partition key
event_seq	timeuuid	Clustering key (ascending)
state	TEXT	One of the state machine values
rider_id	UUID
driver_id	UUID	Nullable until assigned
pickup_lat/lng	DOUBLE
dropoff_lat/lng	DOUBLE	Nullable until trip ends
fare_cents	INT	Set at COMPLETED
surge_multiplier	DECIMAL	Recorded at request time
created_at	TIMESTAMP

Secondary index on rider_id and driver_id (Cassandra materialized views) for “my trips” queries.

Surge Cell (in-memory + Redis)
#

Field	Type	Notes
geohash6	STRING	Partition key
active_drivers	INT	Count in cell, updated from Kafka
open_requests	INT	Requests in last 5 min sliding window
surge_multiplier	DECIMAL	Recomputed every 30 s
updated_at	TIMESTAMP

7. Trade-offs
#

Location Storage: Redis Geo vs. H3/PostGIS
#

Option	Pros	Cons	When
Redis GEO	Sub-ms reads/writes, in-memory, native geo commands	Data is ephemeral, complex sharding at 5 M drivers	Real-time matching hot path
H3 hexagonal grid	Uniform cell area (avoids Geohash distortion near poles), hierarchical resolution	No native DB support, must build custom index	Analytics, surge zones
PostGIS	Rich spatial queries, persistent, SQL joins	Write throughput ceiling ~50 K/s without heroic tuning	Batch analytics, geofence compliance

Conclusion: Redis GEO for the hot path; H3 for surge zone computation; PostGIS for analytics and geofence (city boundary) enforcement.

Matching: Sequential Offer vs. Broadcast
#

Option	Pros	Cons	When
Sequential	No race condition, predictable, fair to drivers	Slightly higher match latency if top driver declines	Default Uber model
Broadcast (all candidates)	Fastest first-accept latency	Race condition, wastes driver attention, unfair	Lyft early model — abandoned
Auction (drivers bid ETA)	Optimal assignment	Complex, latency if bids must be collected	Academic; not production

Conclusion: Sequential offer with 15-second timeout. Match latency P99 is bounded by one timeout cycle (~15 s worst case), which is acceptable.

CAP (Consistency, Availability, Partition tolerance) Theorem Stance
#

Location writes and surge reads tolerate eventual consistency — a driver’s position being 4–8 seconds stale is acceptable. Trip state transitions require strong consistency (you cannot be both REQUESTED and CANCELLED simultaneously) — enforced via Cassandra lightweight transactions (LWT (Lightweight Transaction)) on state columns for critical transitions, accepting higher write latency for trip records.

8. Failure Modes
#

Component	Failure	Impact	Mitigation
Redis shard crash	Lost driver locations for a Geohash region	Drivers invisible → no matches in that area	Redis Sentinel / Cluster auto-failover; drivers re-ping every 4 s, state recovers in < 10 s
Match Service overload	Request queue backup	Match latency spike, riders see “searching” indefinitely	Circuit breaker; horizontal scale-out; degrade to “best available” without ETA computation
Kafka lag on location stream	Surge computation delayed	Surge prices stale	Surge cache has 30 s TTL; stale multiplier displayed with staleness warning; Kafka consumer autoscaling
ETA Service graph stale	ETAs wrong during incident/road closure	Driver mismatch, rider frustration	Fallback to straight-line distance × 1.5 heuristic; push map update from ops dashboard
Driver app offline mid-trip	No location updates during trip	Rider can’t track driver	Last-known position shown; driver re-pings on reconnect; trip timer continues regardless
Thundering herd (concert ends)	50 K simultaneous requests from one venue	Match Service CPU spike	Request queue with backpressure; pre-warm surge prediction model; geofence-based capacity pre-scaling
Hot partition (NYC surge)	One Redis shard overwhelmed	Match latency for NYC	Sub-shard NYC to precision-7 Geohash cells across multiple shards

9. Security & Compliance
#

Authentication / Authorization: Riders and drivers authenticate via OAuth2 (Open Authorization 2.0) tokens (JWT (JSON Web Token) with RS256 signing). The API Gateway validates tokens on every request; downstream services trust the gateway-injected X-Rider-Id / X-Driver-Id headers. Drivers must additionally have an active, verified driver profile — enforced by an authorization middleware checking a driver-status cache (Redis, 60 s TTL).

Location Privacy: Driver precise location is visible to the matched rider only — never broadcast to other riders. Pre-match, riders see only an approximate count of nearby drivers (not their IDs or exact positions). Post-trip, precise GPS traces are retained for 90 days for dispute resolution, then aggregated and anonymized.

Input Validation: Latitude/longitude values are range-checked (−90 ≤ lat ≤ 90, −180 ≤ lng ≤ 180) and rate-limited (max 1 update/second per driver to prevent GPS spoofing floods). Riders cannot submit pickup points outside the operating region (enforced against a city geofence polygon).

Fraud — GPS Spoofing: Drivers sometimes fake locations to appear in surge zones. Mitigations: compare GPS position to device accelerometer data (stationary device with moving GPS = flag); cross-reference with cell tower triangulation; ML model detects implausible movement patterns (teleportation).

Encryption: TLS (Transport Layer Security) 1.3 for all API traffic. PII (Personally Identifiable Information) fields (phone, email, trip history) encrypted at rest with customer-managed keys in a KMS (Key Management Service). GDPR (General Data Protection Regulation) right-to-erasure: trip records pseudonymized after 6 months; full deletion on account closure within 30 days.

Rate Limiting: Rider request endpoint: 1 active request per account (enforced via Redis SET NX (Not eXists)). Driver ping endpoint: 1 update/4 s per driver_id. API Gateway enforces per-IP and per-account limits using token bucket.

10. Observability
#

RED Metrics (Rate, Errors, Duration)
#

Service	Rate	Error	Duration
Location Service	updates/sec per shard	parse errors, Kafka lag	write latency P99
Match Service	match requests/sec	no-driver-found rate, timeout rate	match latency P99
Trip Service	state transitions/sec	invalid transition rejections	write latency
ETA Service	ETA queries/sec	routing failures (no path found)	query latency P99

Business Metrics (Alerts)
#

Metric	Alert Threshold	Why
Match success rate	< 85% over 5 min	Demand outstripping supply
Driver acceptance rate	< 60% over 5 min	Drivers cherry-picking; pricing issue
ETA accuracy error	> 3 min P90	Graph staleness or routing bug
Surge multiplier > 4.0× in any cell	Alert on-call	Potential PR incident; staffing event
Rider cancel rate post-match	> 20%	Long ETA; driver off-route

Tracing
#

Every ride request carries a trace_id from the rider app through API Gateway → Match Service → Trip Service. OpenTelemetry (OTel) spans are sampled at 10% normally, 100% on error. Distributed traces stored in Jaeger with 7-day retention. Tail-based sampling ensures all traces for errored or slow (> 2 s) requests are kept.

11. Scaling Path
#

Phase 1 — MVP (0 → 1 K trips/day, single city)
#

Single-region deployment. Location data in PostgreSQL with PostGIS. Match logic in a monolith. Manual surge pricing. One Kafka cluster. No ETA service — use Google Maps API. Key risk: PostGIS write bottleneck once drivers exceed 10 K.

Phase 2 — Growth (1 K → 100 K trips/day, 3–5 cities)
#

Migrate location hot path to Redis GEO. Extract Match Service as a microservice. Add ETA service with city road graph loaded in memory. Introduce Geohash-based sharding for Redis. Surge pricing automated via Kafka Streams consumer. Key risk: Redis memory cost at 5 M drivers; each record ~200 bytes = 1 GB per shard, manageable.

Phase 3 — Scale (100 K → 1 M trips/day, 20+ cities)
#

Multi-region deployment (US-East, EU, APAC). Redis Cluster per region with consistent hashing across 20 shards. Match Service horizontally scaled behind a load balancer. Trip Service sharded by city_id to bound Cassandra partition sizes. ETA graph updated in near-real-time from speed telemetry. Introduce H3 for surge zone computation. Key risk: cross-region matching for airport trips near city boundaries — solved with a “border zone” broker service.

Phase 4 — Global (1 M+ trips/day, 70 countries)
#

Active-active multi-region with regional data sovereignty (GDPR for EU data, stored in EU only). Predictive pre-dispatch: ML model predicts ride demand 10 minutes out and pre-positions idle drivers using “Quiet Mode” nudges. Road graph updates pushed from a central graph pipeline (processes OpenStreetMap (OSM) diffs) to regional services in < 5 min. Match Service uses a two-tier approach: first-pass Redis GEOSEARCH narrows to 50 candidates, second-pass ML ranking scores all 50 in < 50 ms using a feature vector (driver rating, acceptance rate, ETA, fairness score). Key risk: ML model feedback loop causing driver clustering; solved with exploration noise.

12. Enterprise Considerations
#

Build vs Buy:

Road routing: Building and maintaining a production-grade routing engine (equivalent to OSRM (Open Source Routing Machine) or Valhalla) is a multi-year investment. Uber built their own (H3 + custom routing) because Google Maps pricing at their scale (250 M ETA queries/day) would cost ~$50 M/year. At Series A, use Google Maps or HERE — switch at scale.
Push notifications: Use FCM / APNs. Building a push infrastructure is operational burden with marginal benefit.
Maps tile serving: Mapbox or Google for rider-facing maps; internal graph for ETA/routing only.

Multi-Tenancy: Uber operates UberX, UberPool, Uber Black, UberEats Couriers as distinct “products” on the same platform. Products are a property of driver profiles and trip requests. The Match Service filters by vehicle_type and product_id — no separate infrastructure per product. The surge service computes per-product multipliers independently (Pool surge ≠ Black surge).

Brownfield Integration: Enterprises deploying internal ride-sharing (corporate shuttle, hospital transport) integrate via the Uber for Business API. This wraps the same core platform with a corporate billing layer and policy engine (approved pickup/dropoff zones, spending limits).

TCO (Total Cost of Ownership) Ballpark (per 1 M trips/day):

Redis cluster (location): ~50 shards × i3.2xlarge = ~$20 K/month
Kafka (location stream): 12 brokers × r5.4xlarge = ~$15 K/month
Cassandra (trip history): 30 nodes × i3.4xlarge = ~$35 K/month
ETA service compute: 100 × c5.2xlarge = ~$25 K/month
Total infra: ~$100 K/month for core platform; plus ~$150 K/month for maps/routing API at low scale

Conway’s Law note: Uber’s team structure mirrors the service decomposition — separate teams own Location, Match, Trip, and ETA services. Cross-team coordination happens at Kafka topic contracts, not shared databases.

13. Interview Tips
#

Clarify scope early: Ask whether to include surge pricing, Pool (shared rides), ETA computation, or just the core match flow. Interviewers often want depth on one area, not breadth on all five.
Lead with the geo index decision: The most interesting architectural question is how do you find nearby drivers efficiently. Walk through Geohash vs. Redis GEO vs. PostGIS before the interviewer asks — it signals you know the domain.
Quantify the write problem first: 1.25 M location writes/sec is the headline constraint. Every subsequent decision (Redis over Postgres, ephemeral over durable, shard by Geohash) flows from that number. Derive it from first principles in front of the interviewer.
State machine = strong consistency island: Most of this system is eventually consistent, but trip state is not. Calling this out explicitly (and explaining why you use Cassandra LWT only for trip transitions, not location writes) demonstrates senior-level CAP reasoning.
Vocabulary that signals fluency: Geohash, supply-demand ratio, sequential offer vs broadcast, ETA accuracy P90, thundering herd at venue egress, GPS spoofing mitigation, fare upfront pricing vs post-trip metering.

14. Further Reading
#

H3 — Uber’s Hexagonal Hierarchical Spatial Index: https://eng.uber.com/h3/ — the paper behind Uber’s move from Geohash to H3 for surge zones and demand forecasting.
Uber Engineering Blog — How Uber Computes ETA: https://eng.uber.com/engineering-routing-engine/ — covers the routing engine architecture, graph partitioning, and real-time traffic integration.
OSRM (Open Source Routing Machine): http://project-osrm.org/ — the open-source routing engine used as a reference implementation; studying its Contraction Hierarchies algorithm explains how sub-10 ms routing is achievable on city-scale graphs.
Geohash specification: https://en.wikipedia.org/wiki/Geohash — understand precision levels, edge distortion near cell boundaries, and the “neighbour lookup” trick for searching cells adjacent to a query point.

Netflix — Video Streaming Platform

27 April 2026·17 mins

System Design Classic System-Design Classic

1. Hook # At peak, Netflix accounts for 15% of global internet downstream traffic — roughly 700 Gbps flowing to subscribers in 190 countries. What makes this feasible is not raw bandwidth: it is a carefully engineered pipeline that converts every raw title into over 1,200 encoded video files before a single subscriber presses play, then serves those files from ISP-embedded appliances called Open Connect Appliances (OCA) rather than from a traditional cloud CDN. The streaming experience you see — where the picture quality silently improves while you watch — is ABR (Adaptive Bitrate) streaming dynamically switching between those pre-encoded variants based on your network conditions. Behind the personalised rows on the homepage sits a recommendation engine that runs 45+ algorithms to surface the title you are most likely to start watching in the next 30 seconds. Each of these subsystems operates at a scale where a 0.1% drop in streaming reliability translates to 250,000 subscribers unable to watch at that moment.

YouTube — Video Upload, Transcoding & Global Delivery

27 April 2026·16 mins

System Design Classic System-Design Classic

1. Hook # Every minute, creators upload 500 hours of video to YouTube — roughly 720,000 hours of raw footage per day that must be validated, transcoded into 10+ adaptive formats, and made globally available before viewers ever click play. Unlike Netflix (a closed catalogue of licensed titles transcoded offline), YouTube is a live upload platform: a creator in Lagos hits “publish” and expects global playback within minutes. The upload pipeline, transcoding infrastructure, and two-tier CDN (Content Delivery Network) that make this possible are among the most complex media-engineering systems on the planet. On the consumption side, 2 billion+ logged-in users watch over 1 billion hours of video daily — a recommendation challenge that dwarfs most advertising systems in latency sensitivity and business impact. If the recommendation model serves the wrong video, engagement drops; if the transcoder stalls, creators lose monetisation time.

Instagram

24 April 2026·30 mins

System Design Classic System-Design Classic

1. Hook # Instagram processes 100 million photo and video uploads every day, serves 4.2 billion likes, and delivers personalised feeds to 500 million daily users — all while keeping image loads under 200ms anywhere in the world. The engineering challenge is three-layered: a media processing pipeline that converts every raw upload into five optimised variants before the first follower ever sees it; a hybrid fan-out feed that handles both 400-follower personal accounts and 300-million-follower celebrities without write amplification blowing up; and an Explore page that must surface genuinely relevant content from a corpus of 50 billion posts to users who have never explicitly stated what they want. Each layer has a distinct bottleneck, and solving one often creates pressure on the others.

1. Hook #

2. Problem Statement #

Functional Requirements #

Non-Functional Requirements #

Out of Scope #

3. Scale Estimation #

4. High-Level Design #

5. Deep Dive #

5.1 Driver Location Indexing with Geohash #

5.2 The Matching Algorithm #

5.3 Trip State Machine #

5.4 ETA Computation #

6. Data Model #

Driver Location (Redis, TTL 30 s) #

Trip (Cassandra) #

Surge Cell (in-memory + Redis) #

7. Trade-offs #

Location Storage: Redis Geo vs. H3/PostGIS #

Matching: Sequential Offer vs. Broadcast #

CAP (Consistency, Availability, Partition tolerance) Theorem Stance #

8. Failure Modes #

9. Security & Compliance #

10. Observability #

RED Metrics (Rate, Errors, Duration) #

Business Metrics (Alerts) #

Tracing #

11. Scaling Path #

Phase 1 — MVP (0 → 1 K trips/day, single city) #

Phase 2 — Growth (1 K → 100 K trips/day, 3–5 cities) #

Phase 3 — Scale (100 K → 1 M trips/day, 20+ cities) #

Phase 4 — Global (1 M+ trips/day, 70 countries) #

12. Enterprise Considerations #

13. Interview Tips #

14. Further Reading #

Related

1. Hook
#

2. Problem Statement
#

Functional Requirements
#

Non-Functional Requirements
#

Out of Scope
#

3. Scale Estimation
#

4. High-Level Design
#

5. Deep Dive
#

5.1 Driver Location Indexing with Geohash
#

5.2 The Matching Algorithm
#

5.3 Trip State Machine
#

5.4 ETA Computation
#

6. Data Model
#

Driver Location (Redis, TTL 30 s)
#

Trip (Cassandra)
#

Surge Cell (in-memory + Redis)
#

7. Trade-offs
#

Location Storage: Redis Geo vs. H3/PostGIS
#

Matching: Sequential Offer vs. Broadcast
#

CAP (Consistency, Availability, Partition tolerance) Theorem Stance
#

8. Failure Modes
#

9. Security & Compliance
#

10. Observability
#

RED Metrics (Rate, Errors, Duration)
#

Business Metrics (Alerts)
#

Tracing
#

11. Scaling Path
#

Phase 1 — MVP (0 → 1 K trips/day, single city)
#

Phase 2 — Growth (1 K → 100 K trips/day, 3–5 cities)
#

Phase 3 — Scale (100 K → 1 M trips/day, 20+ cities)
#

Phase 4 — Global (1 M+ trips/day, 70 countries)
#

12. Enterprise Considerations
#

13. Interview Tips
#

14. Further Reading
#