nSkillHub
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

System Design: Uber

Uber is the canonical real-time marketplace system design. It forces you to reason about location indexing at scale, distributed locking under contention, event-driven pipelines, and the latency vs. consistency trade-off in a system where being a few seconds slow means a lost ride. This is an architect-level breakdown.


Requirements

Functional:

  • Rider requests a ride from point A to B
  • System finds nearby available drivers
  • Nearest eligible driver is offered the trip; they accept or reject
  • Trip is tracked in real-time (GPS polling)
  • Fare calculated at trip end; payment triggered
  • Driver/rider rating after completion

Non-functional:

  • Match latency < 5s from request to driver notification
  • Location update ingestion: 500K concurrent drivers updating every 5s
  • 99.99% availability for the matching and location services
  • Geo queries must handle millions of driver location records in < 100ms
  • No double-assignment: a driver must only be matched to one rider at a time
  • Eventual consistency acceptable for analytics; strong consistency required for driver reservation

Back of the Envelope

Scale assumptions:

  • 5M active drivers globally, 500K online at peak
  • 20M ride requests/day → ~230 requests/second
  • Each driver sends location every 4s → 500K / 4 = 125K location writes/second
  • Each location update = ~100 bytes → 12.5 MB/s ingest
  • Trip records: 20M/day × 500 bytes = 10 GB/day → ~3.6 TB/year
  • Driver matching fanout: each request queries ~50 drivers in radius → 230 × 50 = 11,500 push notifications/second

Storage sizing:

  • Active driver locations (in-memory): 500K × 64 bytes = 32 MB in Redis (trivially small)
  • Location history (Cassandra, 30-day retention): 500K drivers × 86400s/day × (1 update/4s) × 100 bytes = ~108 GB/day
  • Trips table (PostgreSQL): 20M rows/day, retained 2 years = ~14.6B rows → partition by date, archive cold data

Latency budget for a match (target < 5s end-to-end):

Rider API → Match Service:           ~50ms
Geo query for nearby drivers:        ~100ms
Reserve driver + Redis lock:         ~50ms
Push notification to driver:         ~200ms
Driver app ACK:                      ~1-3s (network + human latency)
Total budget:                        ~5s (tight but achievable)

High-Level Architecture

graph TD
    Rider[Rider App] -->|REST| APIGW[API Gateway]
    Driver[Driver App] -->|WebSocket / REST| APIGW
    APIGW --> RideReqSvc[Ride Request Service]
    APIGW --> LocationSvc[Location Service]
    APIGW --> TripSvc[Trip Service]

    RideReqSvc -->|Publish ride.requested| Kafka[(Kafka)]
    LocationSvc -->|Write location| Redis[(Redis GeoSet)]
    LocationSvc -->|Async persist| Cassandra[(Cassandra\nLocation History)]

    Kafka --> MatchEngine[Match Engine]
    MatchEngine -->|Geo query| Redis
    MatchEngine -->|SETNX lock| Redis
    MatchEngine -->|Write trip| PostgreSQL[(PostgreSQL\nTrips / Users)]
    MatchEngine -->|Push notify| NotifySvc[Notification Service]

    NotifySvc -->|FCM / APNs| Driver
    TripSvc -->|Read/Write trip state| PostgreSQL
    TripSvc -->|Publish trip.* events| Kafka

    Kafka --> AnalyticsSvc[Analytics / Surge Pricing]
    Kafka --> BillingSvc[Billing Service]

Location Service — The Core Indexing Problem

Why Geohash over simple lat/long queries

A naive SELECT * WHERE lat BETWEEN x1 AND x2 AND lon BETWEEN y1 AND y2 requires a 2D index and doesn’t compose well with caching. Geohash encodes a (lat, lon) into a string prefix — nearby locations share a common prefix. This makes proximity search a string prefix query, which maps naturally onto key-value stores.

Precision table:

Geohash length  Cell size        Use case
5 chars         ~5km × 5km       City-level matching
6 chars         ~1.2km × 0.6km   Neighborhood-level
7 chars         ~153m × 153m     Street-level (Uber uses ~6-7)

Redis GeoSet as the location index

Redis GEOADD / GEORADIUS (or GEOSEARCH) uses a sorted set with a 52-bit geohash score. All 500K driver locations fit in ~32 MB.

GEOADD drivers:available <lon> <lat> <driver_id>
GEOSEARCH drivers:available FROMMEMBER <pickup_geohash> BYRADIUS 2 km ASC COUNT 20

Design decision: Separate sorted sets per availability state: drivers:available, drivers:busy. When a driver accepts, atomically move them: ZREM drivers:available driver_id + ZADD drivers:busy ... in a Lua script (single atomic operation).

Location update pipeline

125K writes/second is too high to write directly to Redis from each driver app. Instead:

Driver App → Kafka (location.updates topic, keyed by driver_id)
          → Location Consumer (batched) → Redis GEOADD pipeline
          → Cassandra (async, for history/replay)

Kafka partitioning by driver_id ensures ordering per driver. Consumers batch 100ms windows, reducing Redis RTTs by 10-50×. This is the canonical high-throughput write fan-in pattern.


Match Engine — Finding and Reserving a Driver

The matching flow

sequenceDiagram
    participant RideReq as Ride Request Service
    participant Kafka
    participant Match as Match Engine
    participant Redis
    participant DB as PostgreSQL
    participant Notify as Notification Service
    participant DriverApp as Driver App

    RideReq->>Kafka: Publish ride.requested (rideId, pickup, rider)
    Kafka->>Match: Consume event
    Match->>Redis: GEOSEARCH drivers:available BYRADIUS 2km
    Redis-->>Match: [driver_1, driver_2, driver_3, ...]
    Match->>Redis: SET lock:driver:driver_1 rideId NX PX 30000
    Redis-->>Match: OK (lock acquired)
    Match->>DB: INSERT trip (rideId, driverId=driver_1, status=PENDING_ACCEPTANCE)
    Match->>Notify: Push offer to driver_1 (30s timeout)
    Notify->>DriverApp: FCM/APNs push

    alt Driver Accepts (within 30s)
        DriverApp->>Match: POST /trips/{rideId}/accept
        Match->>DB: UPDATE trip status=ACCEPTED
        Match->>Redis: DEL lock:driver:driver_1
        Match->>Redis: ZREM drivers:available driver_1
    else Driver Rejects or Timeout
        Match->>Redis: DEL lock:driver:driver_1
        Match->>DB: UPDATE trip status=SEARCHING (try next driver)
        Note over Match: Loop to next candidate
    end

The double-assignment problem and distributed locking

Problem: Two ride requests are processed concurrently by different Match Engine instances. Both query Redis for nearby drivers and get the same driver in their result set. Both try to assign that driver simultaneously.

Solution: Redis SET NX PX (SETNX with TTL)

SET lock:driver:<driver_id> <ride_id> NX PX 30000
  • NX — only set if key does not exist (atomic test-and-set)
  • PX 30000 — auto-expire after 30 seconds (handles Match Engine crash)

The instance that wins the lock proceeds; the loser moves to the next candidate. This is a try-lock pattern — no blocking, no deadlock risk.

Why not a DB-level lock?

  • SELECT FOR UPDATE on a driver row works but requires a DB round-trip and holds a row lock for the full notification wait (up to 30s). At scale this saturates DB connection pools.
  • Redis lock is O(1), sub-millisecond, and lives in-memory. The DB write happens after the lock is acquired, as a record of intent.

TTL is critical: If the Match Engine dies after acquiring the lock but before releasing it, the driver would be unassignable for 30 seconds — then the lock expires and they’re available again. Set TTL = offer timeout + small buffer.

Driver accept/reject mechanics

Once notified, the driver has a fixed window (typically 15-30s) to respond. The driver app maintains a persistent WebSocket or long-poll connection to the Trip Service.

State transitions:

SEARCHING → PENDING_ACCEPTANCE (lock acquired, driver notified)
          → ACCEPTED           (driver taps Accept)
          → SEARCHING          (driver rejects, or timeout, or no-ACK)
          → CANCELLED          (rider cancels while SEARCHING)
          → NO_DRIVERS_FOUND   (exhausted all candidates in radius)

Rejection cascade: When a driver rejects, the Match Engine picks the next-best driver from the original candidate list (sorted by ETA, not just distance). If the list is exhausted, expand the search radius (2km → 4km → 8km) and re-query.

Fairness: Drivers who recently rejected are penalized in ranking for a short cooldown (Redis sorted set with score = last_rejection_time). This prevents “cherry-picking” nearby high-surge rides.


Trip Lifecycle State Machine

SEARCHING
    │
    ▼
PENDING_ACCEPTANCE ──── reject/timeout ──► SEARCHING
    │
    ▼ accept
DRIVER_EN_ROUTE
    │
    ▼ driver arrives
DRIVER_ARRIVED
    │
    ▼ trip starts
IN_PROGRESS
    │
    ▼ destination reached
COMPLETING ──► (fare calc + payment trigger)
    │
    ▼
COMPLETED

Any state ──► CANCELLED (by rider pre-pickup, or driver pre-start)

Each state transition is:

  1. Validated against allowed transitions (prevent illegal state jumps)
  2. Written to PostgreSQL with a status_updated_at timestamp
  3. Published as a Kafka event (trip.driver_en_route, trip.started, trip.completed, etc.)

Kafka Event Pipelines

Kafka is the nervous system — it decouples producers from consumers and enables replay, which is critical for billing and analytics correctness.

Topics and consumers:

Topic                   Producers               Consumers
─────────────────────── ─────────────────────── ──────────────────────────────
location.updates        Driver apps             Location Service → Redis + Cassandra
ride.requested          Ride Request Service    Match Engine
trip.status_changed     Trip Service            Notification Svc, Analytics, Billing
trip.completed          Trip Service            Billing Service, Rating Service
surge.computed          Analytics Service       Pricing Service, Driver App
driver.availability     Driver app              Match Engine (remove from pool)

Partitioning strategy:

Topic Partition Key Reason
location.updates driver_id Ordering per driver; consumer owns all updates for a driver
ride.requested geohash(pickup, 4) Co-locate requests in same city to same Match Engine instance
trip.status_changed trip_id All state changes for a trip processed in order

Consumer group design:

  • Match Engine: single consumer group, each partition processed sequentially (preserves request ordering per region)
  • Analytics: separate consumer group, can lag behind without affecting ride ops
  • Billing: separate consumer group, processes trip.completed with at-least-once delivery + idempotency key

Exactly-once billing: Billing service uses trip_id as idempotency key. On duplicate trip.completed event consumption, the charge is a no-op (idempotent upsert).


Database Decisions

PostgreSQL — Source of Truth for Trips and Users

Why: Trips require ACID transactions. When a trip moves to COMPLETED, we must atomically: finalize fare, mark driver available, trigger billing event. Any partial write here is catastrophic.

Key tables:

-- trips: heavily read/written by match engine and trip service
CREATE TABLE trips (
    trip_id     UUID PRIMARY KEY,
    rider_id    UUID NOT NULL,
    driver_id   UUID,
    status      trip_status NOT NULL,  -- enum
    pickup_lat  DOUBLE PRECISION,
    pickup_lon  DOUBLE PRECISION,
    dropoff_lat DOUBLE PRECISION,
    dropoff_lon DOUBLE PRECISION,
    fare_cents  INT,
    created_at  TIMESTAMPTZ DEFAULT NOW(),
    updated_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Partial index: only active trips need fast lookup by driver
CREATE INDEX idx_trips_driver_active
    ON trips (driver_id)
    WHERE status NOT IN ('COMPLETED', 'CANCELLED');

Sharding decision: Shard by rider_id hash for writes (rider is the initiator, queries are rider-centric). Driver queries (what’s my current trip?) use a secondary index or a separate read replica. At extreme scale, split the trips table into a separate micro-DB per region.

Redis — Hot Operational State

Key Pattern Data Structure TTL Purpose
drivers:available Sorted Set (GeoSet) None Live driver locations
drivers:busy Sorted Set (GeoSet) None In-trip drivers
lock:driver:<id> String 30s Reservation lock
trip:offer:<trip_id> Hash 35s Pending offer state
driver:cooldown:<id> String 120s Post-rejection cooldown
surge:<geohash6> String 60s Surge multiplier per zone

Redis cluster: Geo operations must be on the same shard (Redis Cluster restricts cross-slot operations). Use a consistent hash tag: {region}:drivers:available so all geo keys for a region land on one shard.

Cassandra — Location History and Analytics

Why Cassandra: Location history is pure time-series — write-heavy, rarely read (mostly for compliance, replay, and driver behavior analytics). Cassandra’s LSM-tree storage makes sequential writes extremely fast; it handles 125K writes/second trivially on a 6-node cluster.

CREATE TABLE driver_location_history (
    driver_id   UUID,
    recorded_at TIMESTAMP,
    lat         DOUBLE,
    lon         DOUBLE,
    speed       FLOAT,
    heading     SMALLINT,
    PRIMARY KEY (driver_id, recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC)
  AND default_time_to_live = 2592000;  -- 30 days

Cassandra trade-off: No joins, no aggregations, limited query flexibility. All queries are by (driver_id, time_range). For analytical queries (e.g., heatmaps), data is streamed from Cassandra to a data warehouse (Spark + S3) via Kafka.

Summary: DB per concern

System concern          Storage         Reasoning
──────────────────────  ──────────────  ─────────────────────────────────────
Trip state / Users      PostgreSQL      ACID, relational, low cardinality
Active driver locations Redis GeoSet    Sub-millisecond geo queries, in-memory
Distributed locks       Redis SETNX     Atomic test-and-set, TTL-based safety
Location history        Cassandra       Time-series, write-optimized, huge volume
Event streaming         Kafka           Decoupling, replay, fan-out
Surge / config          Redis           Low-latency reads, TTL-based expiry
Analytics               S3 + Spark      Batch processing, cheap storage

Key Concurrency Problems and Solutions

Problem 1: Race condition in driver status update

When a trip completes, the system must mark the driver available again. If two concurrent events (trip.completed + driver.app_offline) arrive simultaneously, driver might end up stuck in busy state.

Solution: All driver status changes go through a single Kafka partition (keyed by driver_id). One consumer processes them sequentially — no concurrent writes for the same driver.

Problem 2: Stale location data

A driver’s Redis location is 5s old. They may have moved. Matching on stale data sends riders notifications for drivers who are now out of range.

Solution: Attach a location_updated_at timestamp to each driver’s Redis entry (stored in a separate hash or as a field in the geo sorted set score). Match Engine filters out drivers whose location is > 30s stale. Treat stale drivers as unavailable until they send a fresh update.

Problem 3: Surge pricing feedback loop

Analytics computes surge for zone X, pricing service applies a 2× multiplier. This attracts all nearby drivers to zone X, which then drops demand/supply ratio, which zeroes out surge, which disperses drivers. The system oscillates.

Solution: Surge is computed with a smoothing window (exponential moving average over 5 minutes). Surge changes are rate-limited: max ±0.25× per minute per zone. This dampens oscillation without sacrificing responsiveness.

Problem 4: Driver app network flakiness

A driver accepts a trip but the ACK is lost. Their app shows the trip; the server still shows PENDING_ACCEPTANCE and times out, reassigning the ride.

Solution: Idempotent accept endpoint. The driver app retries with the same trip_id. Server checks current state: if already ACCEPTED for this driver, return success. If reassigned to another driver, return 409 Conflict with current trip state.

Problem 5: Hot geohash cell

All ride requests in Times Square at 2am land in the same geohash cell, funneling through the same Kafka partition and Match Engine instance — a hot partition.

Solution: Use a virtual partition layer. Hash (geohash + random_suffix_0_to_N) to create N virtual partitions per cell. Match Engine instances each own a subset. Requests are spread across instances; they all query the same Redis geo index (which is globally consistent).


Trade-off Decision Record

1. WebSocket vs. long-poll for driver app communication

Option Pro Con
WebSocket Low latency, full-duplex, real-time Complex load balancer (sticky sessions or pub-sub), harder mobile reconnect
Long-poll Simple, HTTP-compatible Higher latency, 2× connections per update cycle
Decision WebSocket for active trips; polling for idle drivers Saves connection overhead for 80% of drivers who are idle

2. Geohash vs. QuadTree for spatial indexing

Option Pro Con
Geohash + Redis Operationally simple, Redis handles scale Boundary problem (cells at geohash borders may miss close drivers)
QuadTree Adaptive density, no boundary edge cases Requires custom in-memory service, more operational complexity
Decision Geohash + Redis + boundary expansion Query cell + all 8 neighbors; covers boundary cases, operationally simple

3. Strong vs. eventual consistency for driver location

Option Pro Con
Strong consistency Always accurate Latency and write amplification
Eventual consistency High throughput, simple May match to a driver who moved
Decision Eventual, with staleness filter Acceptable: a slightly stale match is fine; the driver’s ETA calculation corrects for actual position

4. Redis lock vs. DB optimistic lock for driver reservation

Option Pro Con
Redis SETNX Sub-millisecond, no DB pressure Redis can fail; lock state lost on restart
DB SELECT FOR UPDATE Durable, ACID Holds connection + row lock for 15-30s; scales poorly
DB optimistic (version CAS) No lock held Retry storms under high contention
Decision Redis SETNX as primary; DB write as confirmation Fast reservation path; DB is the durable record, Redis is the coordination mechanism

Surge Pricing Pipeline

Kafka: trip.requested + trip.completed
       ↓
Surge Compute Service (Flink / Spark Streaming)
  - Count requests vs. available drivers per geohash6 cell, 1-min window
  - Compute demand/supply ratio
  - Apply smoothing (EWA α=0.3)
  ↓
Redis: SET surge:<geohash6> <multiplier> EX 60
  ↓
Pricing Service reads Redis on fare calculation
Driver App: GET /surge?geohash=<cell> → reads Redis directly (< 5ms)

Surge data has a 60s TTL in Redis. If the compute service dies, surge decays to 1.0× naturally — safe degradation.


Failure Modes and Resilience

Redis failure:

  • Location data: re-populated from driver apps within one polling cycle (5s). Fast recovery.
  • Locks: Match Engine detects Redis unavailability; falls back to DB-level advisory lock (pg_try_advisory_lock). Higher latency but correct.

Match Engine crash mid-lock:

  • Redis TTL releases the lock after 30s. Trip stays in PENDING_ACCEPTANCE. A background sweeper job (runs every 10s) finds trips stuck in this state for > 35s and re-queues them to Kafka as a new ride.requested event.

Kafka partition leader failure:

  • Kafka’s built-in leader election (< 30s). Consumers stall briefly, then resume. Location updates buffer in producer; trip requests are retried by Ride Request Service with exponential backoff.

PostgreSQL primary failure:

  • RDS Multi-AZ: automatic failover ~60-120s. During failover, Match Engine falls back to Redis for trip state reads (cache-aside). Writes queue in Kafka and are applied post-recovery.