Skip to main content
  1. System designs - 100+/
  2. Classic/

YouTube — Video Upload, Transcoding & Global Delivery

Lakshay Jawa
Author
Lakshay Jawa
Sharing knowledge on system design, Java, Spring, and software engineering best practices.
Table of Contents

1. Hook
#

Every minute, creators upload 500 hours of video to YouTube — roughly 720,000 hours of raw footage per day that must be validated, transcoded into 10+ adaptive formats, and made globally available before viewers ever click play. Unlike Netflix (a closed catalogue of licensed titles transcoded offline), YouTube is a live upload platform: a creator in Lagos hits “publish” and expects global playback within minutes. The upload pipeline, transcoding infrastructure, and two-tier CDN (Content Delivery Network) that make this possible are among the most complex media-engineering systems on the planet. On the consumption side, 2 billion+ logged-in users watch over 1 billion hours of video daily — a recommendation challenge that dwarfs most advertising systems in latency sensitivity and business impact. If the recommendation model serves the wrong video, engagement drops; if the transcoder stalls, creators lose monetisation time.


2. Problem Statement
#

Functional Requirements
#

  1. Creators can upload videos (any container/codec, up to 12 hours / 256 GB).
  2. Uploaded videos are transcoded into multiple resolutions and codecs (AVC / VP9 / AV1) within minutes of upload.
  3. Viewers can stream any video at adaptive bitrates across devices worldwide.
  4. Viewers can search by keyword, title, channel, or topic.
  5. Viewers can post, vote on, and read threaded comments.
  6. The homepage and sidebar serve personalised video recommendations in < 200 ms.

Non-Functional Requirements
#

Attribute Target
Transcode completion (p95) < 5 min after upload (standard HD), < 30 min (4K HDR)
Playback start latency (p95) < 2 s
Video availability after publish < 10 min globally
Platform availability 99.99% (< 53 min downtime/year)
Search freshness < 15 min after publish
Comment write latency < 500 ms
Recommendation API latency (p99) < 200 ms

Out of Scope
#

  • Live streaming (YouTube Live is a separate RTMP/WebRTC ingest path)
  • Creator monetisation and ad serving
  • Copyright detection (Content ID is a separate fingerprinting system)
  • Billing and channel memberships

3. Scale Estimation
#

Assumptions:

  • 2B logged-in users; 500M Daily Active Users (DAU).
  • 500 hours of video uploaded per minute → ~30,000 minutes/min raw upload.
  • Average upload size: 1 GB per 5-minute video → ~200 MB/min of raw video.
  • Average transcoded output per video: 10 format/resolution variants, ~500 MB total.
  • 1B hours of watch time per day; average session 6 minutes → ~10B video views/day.
  • Comment volume: 100M comments/day.
Metric Calculation Result
Upload writes (raw) 500 hrs/min × 60 min/hr × 200 MB/min ~6 TB/min raw ingest
Transcoder output/day 720,000 video-hours × 500 MB/video-hr ~360 TB/day
Stored video (5-year corpus) 500 hrs/min × 526,000 min/yr × 5 yr × 500 MB ~660 PB
Read QPS (video manifest) 10B views / 86,400 s ~115,000 QPS
Peak CDN egress 500M concurrent viewers × 4 Mbps avg ~2 Pbps at peak
Comment writes/s 100M / 86,400 ~1,160 writes/s
Comment reads/s ×50 read/write ratio ~58,000 reads/s
Search QPS 500M DAU × 4 searches/day / 86,400 ~23,000 QPS
Recommendation calls/s 500M DAU × 10 impressions/session / 86,400 ~58,000 QPS

The CDN egress (~2 Pbps) is the dominant constraint, pushing YouTube to operate a two-tier CDN with ISP-embedded edge caches, similar in philosophy to Netflix’s OCA (Open Connect Appliances).


4. High-Level Design
#

The system decomposes into four independent planes: the upload & transcode plane (raw ingest → encoding DAG → object storage), the streaming plane (manifest API → CDN tier → ABR (Adaptive Bitrate) player), the engagement plane (comments, likes, view counts), and the discovery plane (search, recommendations).

flowchart TD
    subgraph Creator["Creator"]
        UP["Upload Client\n(browser / app)"]
    end
    subgraph Ingest["Upload & Transcode Plane"]
        ULS["Upload Service\n(resumable, chunked)"]
        RAW["Raw Object Store\n(GCS)"]
        TQ["Transcode Job Queue\n(Pub/Sub)"]
        TC["Transcode Workers\n(FFmpeg fleet)"]
        ENC["Encoded Segments\n(GCS — DASH / HLS)"]
        META["Video Metadata DB\n(Spanner)"]
    end
    subgraph Stream["Streaming Plane"]
        MAPI["Manifest API"]
        L1["Tier-1 Edge Cache\n(ISP PoP)"]
        L2["Tier-2 Regional Cache\n(Google PoP)"]
        ORI["Origin Servers\n(GCS signed URL)"]
    end
    subgraph Discovery["Discovery Plane"]
        SRCH["Search Service\n(inverted index)"]
        REC["Recommendation Service\n(two-tower model)"]
        FEED["Homepage API"]
    end
    subgraph Engage["Engagement Plane"]
        CMT["Comment Service"]
        VCNT["View Count Service\n(Bigtable)"]
    end

    UP -->|"Resumable PUT (chunks)"| ULS
    ULS --> RAW
    RAW --> TQ
    TQ --> TC
    TC --> ENC
    TC --> META
    ENC --> L2
    L2 --> L1

    MAPI --> L1
    L1 -->|cache miss| L2
    L2 -->|cache miss| ORI
    ORI --> ENC

    FEED --> REC
    FEED --> SRCH
    CMT --> META
    UP -->|"publish event"| SRCH
    UP -->|"publish event"| REC

Write path (upload): Creator client uploads in 5 MB chunks via a resumable upload protocol to the Upload Service. Once all chunks land in raw object storage, a job message is pushed to the Transcode Queue. A Transcode Worker picks up the job, runs parallel FFmpeg processes for each output format, and writes encoded MPEG-DASH (Dynamic Adaptive Streaming over HTTP) and HLS (HTTP Live Streaming) segments back to object storage. The Metadata DB is updated; a publish event triggers Search indexing and Recommendation model updates.

Read path (streaming): The player fetches a manifest (MPD or M3U8) from the Manifest API, which returns pre-signed segment URLs pointing at the nearest Tier-1 edge cache (embedded at the ISP). Cache misses fall through to Tier-2 Google PoP caches, then to origin object storage. The ABR player selects the appropriate bitrate variant every few seconds based on measured throughput.

Component Technology Role
Upload Service Custom gRPC (Google Remote Procedure Call) + GCS resumable API Chunked ingest, deduplication, virus scan
Raw & Encoded Store Google Cloud Storage (GCS) Durable object storage, region-replicated
Transcode Queue Google Pub/Sub Durable job fan-out to transcode workers
Transcode Workers FFmpeg on preemptible VMs / custom ASICs Parallel encode: H.264, VP9, AV1 at 360p–4K
Video Metadata DB Cloud Spanner Globally consistent video metadata, channel info
View Count Store Cloud Bigtable High-write counter sharding, eventual consistency
Comment Store Spanner + Bigtable Threaded comments, votes, moderation state
CDN Tier-1 (Edge) Google Edge Cache at ISP PoP Last-mile segment delivery, 90%+ hit rate
CDN Tier-2 (Regional) Google PoP (~180 global) Region-level cache, reduces origin load
Search Internal inverted index (Google Search infra) Full-text search, autocomplete, query intent
Recommendation Two-tower deep neural network (DNN) Candidate retrieval + ranking at < 200 ms

5. Deep Dive
#

5.1 Upload Pipeline — Resumable Chunked Upload
#

Large uploads (multi-GB raw files) fail frequently on mobile. YouTube uses a resumable upload protocol: the client first POSTs metadata to obtain an upload session URI, then streams 5 MB chunks with byte-range headers. If the connection drops, the client queries the upload service for the last confirmed byte offset and resumes from there. Only when all chunks are acknowledged does the upload service write the assembled file to raw GCS and emit a VideoUploaded event.

// Simplified resumable upload session handler
public record UploadSession(
    String sessionId,
    String videoId,
    String rawGcsPath,
    long totalBytes,
    long confirmedBytes,
    Instant expiresAt
) {}

@PutMapping("/upload/{sessionId}")
public ResponseEntity<Void> uploadChunk(
        @PathVariable String sessionId,
        @RequestHeader("Content-Range") String contentRange,
        InputStream body) {

    UploadSession session = sessionStore.get(sessionId)
        .orElseThrow(() -> new SessionNotFoundException(sessionId));

    var range = ContentRange.parse(contentRange); // e.g. bytes 0-5242879/209715200
    if (range.start() != session.confirmedBytes()) {
        return ResponseEntity.status(308) // Resume Incomplete
            .header("Range", "bytes=0-" + (session.confirmedBytes() - 1))
            .build();
    }

    gcsClient.writeChunk(session.rawGcsPath(), range.start(), body, range.length());
    long updated = session.confirmedBytes() + range.length();
    sessionStore.updateConfirmed(sessionId, updated);

    if (updated >= session.totalBytes()) {
        pubSub.publish("video-uploaded", new VideoUploadedEvent(session.videoId()));
        return ResponseEntity.ok().build();
    }
    return ResponseEntity.status(308)
        .header("Range", "bytes=0-" + (updated - 1))
        .build();
}

5.2 Transcoding DAG
#

Each upload triggers a transcoding DAG (Directed Acyclic Graph) with these stages in parallel:

  1. Demux & validate — container inspection (MP4/MOV/MKV), codec detection, duration, resolution.
  2. Per-format encode — for each of ~10 output profiles (360p/AVC, 480p/AVC, 720p/VP9, 1080p/VP9, 1080p/AV1, 1440p/AV1, 2160p/AV1 + audio variants), FFmpeg runs on a dedicated preemptible VM.
  3. Segment & package — output is split into 2-second DASH segments and 6-second HLS segments; manifests (MPD and M3U8) are generated.
  4. Pre-warm CDN — encoded segments for the lowest bitrates (360p, 480p) are immediately pushed to Tier-2 regional caches so the video is playable before 4K encoding finishes.

YouTube’s internal transcoder was later extended with custom ASICs (Application-Specific Integrated Circuits) to encode AV1 — a compute-heavy codec — at 10× lower cost per video-minute than GPU-based FFmpeg.

5.3 Adaptive Bitrate Streaming
#

The DASH player measures available throughput every 2 seconds. It maintains a buffer goal (e.g., 30 seconds ahead). If the buffer is healthy, it requests a higher bitrate segment next; if bandwidth drops, it switches to a lower bitrate without stalling. This bitrate decision lives entirely on the client — the server just needs to serve segments fast.

The manifest API returns a signed URL per segment, valid for 1 hour, pointing at the nearest Tier-1 ISP cache. Segment files are immutable (content-addressed by video ID + format + timestamp), making CDN caching trivial.

5.4 View Count at Scale
#

Naive view count increments against a single counter row would saturate any database. YouTube shards the counter:

  • Write path: each view event is published to a Bigtable row keyed by videoId#shardId (100 shards per video). Writers round-robin across shards.
  • Read path (approximate): a background job periodically sums the 100 shard rows and writes the aggregate to a separate videoId#total row. Display queries read the aggregate; real-time shard reads are only done for fraud detection.
  • Write amplification cap: heavily viral videos may spike to millions of writes/s. Rate-limited batching at the app layer collapses nearby events into one Bigtable increment per 100 ms window per shard.

6. Data Model
#

Video Metadata (Cloud Spanner)
#

Column Type Notes
video_id STRING(22) Base64url random, PK
channel_id STRING(22) FK → channel table
title STRING(200) Full-text indexed externally
description STRING(5000)
status ENUM UPLOADING / PROCESSING / LIVE / DELETED
duration_s INT64 Seconds
formats_ready ARRAY e.g. ["360p","720p"], updated per transcode job
thumbnail_url STRING(500) GCS URL
published_at TIMESTAMP Index for feed freshness
view_count_approx INT64 Updated by batch aggregation
tags ARRAY Propagated to search index

Indexes: (channel_id, published_at DESC) for channel feed; published_at DESC for trending; status for ops dashboards.

Comment Thread (Spanner)
#

Column Type Notes
comment_id STRING(22) PK
video_id STRING(22) FK, interleaved in video parent table
parent_comment_id STRING(22) NULL for top-level
author_id STRING(22)
body STRING(10000)
like_count INT64
created_at TIMESTAMP
moderation_state ENUM PENDING / APPROVED / REMOVED

Interleaving on video_id means Spanner co-locates all comments for a video on the same split, making “load top comments for video” a single-range scan.

View Count (Bigtable)
#

Row key Column family Qualifier Value
videoId#shardN cnt v INT64 delta
videoId#total cnt v INT64 aggregate

7. Trade-offs
#

7.1 DASH vs HLS Delivery
#

DASH HLS
Standard ISO MPEG-DASH (open) Apple proprietary
DRM (Digital Rights Management) Widevine / PlayReady natively FairPlay (Apple), SAMPLE-AES
Browser support Chrome, Firefox, Edge (via MSE) Safari native, others via JS
Segment size 2 s (lower latency) 6 s (safer for CDN caching)

Conclusion: YouTube serves both formats. DASH for Android/Chrome/desktop; HLS for iOS/Safari. The transcoder produces both from the same encoded segments.

7.2 AV1 vs VP9 vs H.264
#

Codec Compression gain over H.264 Encode cost Decode support
H.264 / AVC — (baseline) Universal
VP9 ~30% better Chrome, Android, smart TVs
AV1 ~50% better 20× (SW) / 2× (ASIC) Chrome, Android 10+, some TVs

Conclusion: YouTube encodes all three. H.264 is the safe fallback; VP9 is the default for desktop/Android; AV1 is rolled out where decode support exists and saves ~50% bandwidth (which at 2 Pbps is enormous). The custom AV1 ASIC investment was justified by the CDN bandwidth savings alone.

7.3 Push vs Pull Pre-warm on CDN
#

Strategy Latency after publish Origin load Storage waste
Push (pre-warm all formats) Near-zero Low High — most videos get < 100 views
Pull (lazy on first request) First viewer pays cache-fill latency Spiky None
Hybrid (push lowest bitrates only) Near-zero for 360p/480p Moderate Low

Conclusion: YouTube pre-warms only the two lowest bitrate variants immediately after encode. Higher bitrates are pulled on demand and cached at Tier-2. The vast majority of views happen on popular videos that will fill Tier-1 quickly via pull.

7.4 Eventual vs Strong Consistency for View Counts
#

Strong consistency on a global counter requires distributed transactions — latency in the hundreds of milliseconds. YouTube tolerates approximate counts (±0.5%) in exchange for < 5 ms write latency. The exact count is reconciled nightly for monetisation payouts where accuracy matters.


8. Failure Modes
#

Component Failure Impact Mitigation
Transcode Worker VM preempted mid-job Video stuck in PROCESSING Pub/Sub nack → retry with exponential backoff; dead-letter queue (DLQ) after 5 attempts; ops alert on DLQ depth
CDN Tier-1 node ISP cache node goes down Viewers in that ISP fall back to Tier-2; latency spike Anycast routing automatically fails over to nearest healthy Tier-2 PoP; no creator action needed
View Count Bigtable Hot partition on viral video Write latency spike, potential throttling 100-shard key design caps per-shard write rate; Bigtable auto-splits hot tablets
Recommendation Service Model serving latency spike Homepage shows stale/generic recs Circuit breaker returns pre-computed top-N popular videos as fallback within 50 ms SLA
Metadata Spanner Regional outage Publish/update writes fail; reads degrade Spanner multi-region config; reads from any replica; writes wait for quorum (< 100 ms added latency in normal operation)
Upload Service Thundering herd — viral event causes upload spike Upload latency, transcode queue depth grows Pub/Sub backpressure; transcode fleet autoscales on queue depth metric; priority queue for monetised channels

9. Security & Compliance
#

AuthN/AuthZ (Authentication/Authorisation):

  • Upload requires OAuth 2.0 bearer token (Google Identity); creator’s channel_id is embedded in the session and validated on every chunk.
  • Private and unlisted videos: manifest API checks ACL (Access Control List) before returning segment URLs; signed URLs expire in 1 hour.

Encryption:

  • In transit: TLS (Transport Layer Security) 1.3 for all client-facing connections; internal GCS traffic uses Google-managed encryption.
  • At rest: GCS server-side AES (Advanced Encryption Standard)-256 encryption; CMEK (Customer-Managed Encryption Keys) available for enterprise.

Content Safety:

  • Every upload is scanned by the Content Safety API (CSAM hash matching, known-bad signature DB) before the VideoUploaded event fires.
  • A machine learning classifier runs asynchronously and may suppress public visibility pending human review.

Input Validation:

  • File type validation (magic bytes, not just extension) before raw write to GCS.
  • Metadata fields (title, description, tags) are HTML-escaped and length-capped server-side.
  • Rate limiting per channel_id: 50 uploads/hour, enforced at the Upload Service via Redis sliding window.

GDPR (General Data Protection Regulation) / Privacy:

  • Right-to-erasure: deleting a channel triggers an async job that removes video data from GCS, CDN cache purge, and tombstones the Spanner row. Comment bodies are replaced with [deleted] within 30 days.
  • PII (Personally Identifiable Information) in comments is subject to NLP-based scanning; comments may be redacted under GDPR Article 17 requests.

Audit Trail:

  • All upload, publish, and deletion events are appended to an immutable audit log (BigQuery streaming insert with insertId deduplication), retained 7 years for compliance.

10. Observability
#

RED Metrics (Rate, Errors, Duration):

Signal Metric Alert Threshold
Upload Rate upload_sessions_started / min Drop > 20% from baseline → PagerDuty
Transcode Error Rate transcode_jobs_failed / total > 0.5% → P2 alert
Transcode Duration (p95) transcode_duration_p95_s > 600 s for 1080p → P2
Playback Start Rate playback_success / total_attempts < 99% → P1 alert
CDN Hit Rate cache_hits / total_segment_requests < 90% → investigate origin load
Recommendation Latency (p99) rec_api_latency_p99_ms > 250 ms → scale recommendation pods

Saturation Metrics:

  • Transcode queue depth per priority tier (target: < 1,000 pending jobs)
  • Bigtable tablet CPU utilisation (target: < 70%)
  • CDN Tier-1 egress utilisation per ISP node (target: < 80%)

Business Metrics:

  • Creator upload-to-live latency (p50 / p95 / p99) — creator satisfaction KPI (Key Performance Indicator)
  • Viewer rebuffering ratio per region per device type
  • Recommendation CTR (Click-Through Rate) and watch time per served impression

Tracing:

  • Distributed trace spans (OpenTelemetry) propagated through Upload → Pub/Sub → Transcode Worker → CDN Pre-warm chain. Sampled at 1% in production; 100% for failed jobs.
  • Jaeger / Cloud Trace stores 15-day retention; critical paths instrumented with structured logging correlated by trace_id.

11. Scaling Path
#

Phase 1 — MVP (< 1K uploads/day, < 100K views/day)

  • Single upload service, FFmpeg on a few VMs, PostgreSQL for metadata, single-region object storage.
  • No CDN — serve video directly from object storage signed URLs.
  • What breaks first: FFmpeg queue depth grows linearly with upload rate; manual scaling needed above ~50 concurrent uploads.

Phase 2 — 10K uploads/day, 10M views/day

  • Introduce Pub/Sub for transcode job fan-out; autoscale transcode fleet on queue depth.
  • Add a CDN in front of object storage to absorb read traffic; cache hit rate reduces origin egress 80%+.
  • Migrate metadata to Spanner for global consistency and scale.
  • What breaks first: view counts on popular videos hammer a single row → introduce Bigtable shard counter.

Phase 3 — 100K uploads/day, 100M views/day

  • Two-tier CDN (ISP PoP caches + regional PoP); pre-warm popular/trending videos.
  • Introduce VP9 encoding; AV1 R&D begins.
  • Recommendation model replaces heuristic popularity-based ranking; candidate retrieval separated from ranking for latency.
  • What breaks first: search index freshness degrades under ingestion load → dedicate search indexing pipeline with 5-min SLO.

Phase 4 — 500+ hours/min uploads, 1B+ daily views (current scale)

  • Custom AV1 ASICs for encoding at 2× cost efficiency vs GPU.
  • Tens of thousands of ISP PoP nodes globally; Anycast routing for lowest RTT (Round-Trip Time).
  • Recommendation model: multi-task DNN (Deep Neural Network) trained on watch time, click-through, satisfaction survey signals.
  • Structured concurrency for upload session management (Java 21 virtual threads handle millions of simultaneous upload sessions cheaply).
  • What breaks first: global metadata consistency latency at Spanner becomes visible for creator dashboards → introduce read caching layer with 30-second TTL (Time-To-Live) for analytics queries.

12. Enterprise Considerations
#

Brownfield Integration: An enterprise deploying a private YouTube-like platform (internal training videos, compliance recordings) would integrate with existing IAM (Identity and Access Management) providers (Okta, Azure AD) via SAML (Security Assertion Markup Language) 2.0 or OIDC (OpenID Connect). The video metadata store needs to feed existing search (Elasticsearch) and DMS (Document Management Systems) for compliance discovery.

Build vs Buy:

Component Build Buy
Transcoding FFmpeg (open-source) + custom orchestration AWS Elemental MediaConvert, Mux
CDN Internal ISP PoP nodes (YouTube scale only) Cloudflare, Fastly, Akamai
Object Storage GCS / S3 (effectively buy)
Recommendation Build (core IP) AWS Personalize, Vertex AI
Search Build for YouTube scale Elasticsearch for < 100M videos

Multi-Tenancy:

  • Namespace all storage paths by tenantId/videoId for data isolation.
  • Separate transcode queues per tenant tier (premium channels get priority queue).
  • Rate limits and storage quotas enforced per channelId at the API gateway.

TCO (Total Cost of Ownership) Ballpark: At YouTube’s scale, CDN egress dominates cost. Moving from VP9 to AV1 saves ~$500M/year in CDN bandwidth cost at 2 Pbps egress and $0.01/GB rates (order-of-magnitude estimate). Custom ASIC investment of ~$50M amortised over 5 years pays back in < 3 months of bandwidth savings.

Conway’s Law: YouTube’s separate teams for Upload, Transcode, CDN, Recommendation, and Search directly mirror the service decomposition. The upload → transcode → CDN pre-warm boundary is an organisational boundary as much as an architectural one.


13. Interview Tips
#

  • Start with the upload pipeline: Most interviewers care more about “how does video get processed” than the CDN. Walk through chunked upload → Pub/Sub → transcoding DAG → segment storage → CDN pre-warm in the first 10 minutes.
  • Clarify encoding requirements early: Ask “do we need multi-resolution support?” — this drives the transcoding DAG complexity. If yes, explain parallel per-format encoding as a fan-out pattern.
  • Call out the view count hot partition problem: This is a canonical high-write counter problem. Mention counter sharding (N shards per video, periodic aggregation) before the interviewer brings it up.
  • Recommendation system depth: If asked to go deep, describe the two-tower architecture — a user embedding tower and a video embedding tower trained jointly, ANN (Approximate Nearest Neighbours) retrieval for candidate generation, then a ranking model. Mention that retrieval and ranking are split for latency budget reasons.
  • Common mistake: Designing the streaming path to return raw GCS URLs from the metadata DB. The Manifest API should return CDN-prefixed, signed, short-TTL URLs so clients never bypass the cache layer.
  • Fluency vocabulary: Use “DASH manifest”, “ABR switching”, “transcode DAG”, “CDN pre-warm”, “counter shard”, “DLQ (Dead-Letter Queue)”, “ANN retrieval”, “two-tower DNN”.

14. Further Reading
#

Related

Netflix — Video Streaming Platform

1. Hook # At peak, Netflix accounts for 15% of global internet downstream traffic — roughly 700 Gbps flowing to subscribers in 190 countries. What makes this feasible is not raw bandwidth: it is a carefully engineered pipeline that converts every raw title into over 1,200 encoded video files before a single subscriber presses play, then serves those files from ISP-embedded appliances called Open Connect Appliances (OCA) rather than from a traditional cloud CDN. The streaming experience you see — where the picture quality silently improves while you watch — is ABR (Adaptive Bitrate) streaming dynamically switching between those pre-encoded variants based on your network conditions. Behind the personalised rows on the homepage sits a recommendation engine that runs 45+ algorithms to surface the title you are most likely to start watching in the next 30 seconds. Each of these subsystems operates at a scale where a 0.1% drop in streaming reliability translates to 250,000 subscribers unable to watch at that moment.

Uber / Ride-Sharing System

1. Hook # Every time someone taps “Request Ride” on Uber, the platform must answer a deceptively hard spatial query in under a second: which of the thousands of nearby drivers is the best match for this rider, given their location, heading, vehicle type, and current workload? Uber processes 25 million trips per day across 70+ countries, with peak demand spikes during commute hours, concerts, and bad weather — all of which arrive simultaneously in the same city blocks.

WhatsApp / Chat Messaging System

1. Hook # WhatsApp delivers 100 billion messages every day to 2 billion users across 180+ countries — all end-to-end encrypted (E2EE), with sub-second latency, and with a global engineering team historically smaller than 50 engineers. The system does this while providing strong delivery guarantees (a message is either delivered exactly once or the sender knows it was not), preserving per-conversation message ordering even when users switch networks mid-send, and maintaining ephemeral server storage so that once a message is delivered it lives only on client devices.

Instagram

1. Hook # Instagram processes 100 million photo and video uploads every day, serves 4.2 billion likes, and delivers personalised feeds to 500 million daily users — all while keeping image loads under 200ms anywhere in the world. The engineering challenge is three-layered: a media processing pipeline that converts every raw upload into five optimised variants before the first follower ever sees it; a hybrid fan-out feed that handles both 400-follower personal accounts and 300-million-follower celebrities without write amplification blowing up; and an Explore page that must surface genuinely relevant content from a corpus of 50 billion posts to users who have never explicitly stated what they want. Each layer has a distinct bottleneck, and solving one often creates pressure on the others.

Twitter / Social Media Feed

1. Hook # Twitter at peak serves 600K tweet reads per second while simultaneously processing tens of thousands of new tweets. The naive approach — querying who you follow, then fetching all their tweets, then sorting — collapses instantly at scale. The real architecture is a masterclass in the write-amplification vs read-latency trade-off, and the edge cases (Lady Gaga following Justin Bieber, or vice versa) reveal why no single strategy wins.