YouTube — Video Upload, Transcoding & Global Delivery

Table of Contents

1. Hook
#

Every minute, creators upload 500 hours of video to YouTube — roughly 720,000 hours of raw footage per day that must be validated, transcoded into 10+ adaptive formats, and made globally available before viewers ever click play. Unlike Netflix (a closed catalogue of licensed titles transcoded offline), YouTube is a live upload platform: a creator in Lagos hits “publish” and expects global playback within minutes. The upload pipeline, transcoding infrastructure, and two-tier CDN (Content Delivery Network) that make this possible are among the most complex media-engineering systems on the planet. On the consumption side, 2 billion+ logged-in users watch over 1 billion hours of video daily — a recommendation challenge that dwarfs most advertising systems in latency sensitivity and business impact. If the recommendation model serves the wrong video, engagement drops; if the transcoder stalls, creators lose monetisation time.

2. Problem Statement
#

Functional Requirements
#

Creators can upload videos (any container/codec, up to 12 hours / 256 GB).
Uploaded videos are transcoded into multiple resolutions and codecs (AVC / VP9 / AV1) within minutes of upload.
Viewers can stream any video at adaptive bitrates across devices worldwide.
Viewers can search by keyword, title, channel, or topic.
Viewers can post, vote on, and read threaded comments.
The homepage and sidebar serve personalised video recommendations in < 200 ms.

Non-Functional Requirements
#

Attribute	Target
Transcode completion (p95)	< 5 min after upload (standard HD), < 30 min (4K HDR)
Playback start latency (p95)	< 2 s
Video availability after publish	< 10 min globally
Platform availability	99.99% (< 53 min downtime/year)
Search freshness	< 15 min after publish
Comment write latency	< 500 ms
Recommendation API latency (p99)	< 200 ms

Out of Scope
#

Live streaming (YouTube Live is a separate RTMP/WebRTC ingest path)
Creator monetisation and ad serving
Copyright detection (Content ID is a separate fingerprinting system)
Billing and channel memberships

3. Scale Estimation
#

Assumptions:

2B logged-in users; 500M Daily Active Users (DAU).
500 hours of video uploaded per minute → ~30,000 minutes/min raw upload.
Average upload size: 1 GB per 5-minute video → ~200 MB/min of raw video.
Average transcoded output per video: 10 format/resolution variants, ~500 MB total.
1B hours of watch time per day; average session 6 minutes → ~10B video views/day.
Comment volume: 100M comments/day.

Metric	Calculation	Result
Upload writes (raw)	500 hrs/min × 60 min/hr × 200 MB/min	~6 TB/min raw ingest
Transcoder output/day	720,000 video-hours × 500 MB/video-hr	~360 TB/day
Stored video (5-year corpus)	500 hrs/min × 526,000 min/yr × 5 yr × 500 MB	~660 PB
Read QPS (video manifest)	10B views / 86,400 s	~115,000 QPS
Peak CDN egress	500M concurrent viewers × 4 Mbps avg	~2 Pbps at peak
Comment writes/s	100M / 86,400	~1,160 writes/s
Comment reads/s	×50 read/write ratio	~58,000 reads/s
Search QPS	500M DAU × 4 searches/day / 86,400	~23,000 QPS
Recommendation calls/s	500M DAU × 10 impressions/session / 86,400	~58,000 QPS

The CDN egress (~2 Pbps) is the dominant constraint, pushing YouTube to operate a two-tier CDN with ISP-embedded edge caches, similar in philosophy to Netflix’s OCA (Open Connect Appliances).

4. High-Level Design
#

The system decomposes into four independent planes: the upload & transcode plane (raw ingest → encoding DAG → object storage), the streaming plane (manifest API → CDN tier → ABR (Adaptive Bitrate) player), the engagement plane (comments, likes, view counts), and the discovery plane (search, recommendations).

flowchart TD
    subgraph Creator["Creator"]
        UP["Upload Client\n(browser / app)"]
    end
    subgraph Ingest["Upload & Transcode Plane"]
        ULS["Upload Service\n(resumable, chunked)"]
        RAW["Raw Object Store\n(GCS)"]
        TQ["Transcode Job Queue\n(Pub/Sub)"]
        TC["Transcode Workers\n(FFmpeg fleet)"]
        ENC["Encoded Segments\n(GCS — DASH / HLS)"]
        META["Video Metadata DB\n(Spanner)"]
    end
    subgraph Stream["Streaming Plane"]
        MAPI["Manifest API"]
        L1["Tier-1 Edge Cache\n(ISP PoP)"]
        L2["Tier-2 Regional Cache\n(Google PoP)"]
        ORI["Origin Servers\n(GCS signed URL)"]
    end
    subgraph Discovery["Discovery Plane"]
        SRCH["Search Service\n(inverted index)"]
        REC["Recommendation Service\n(two-tower model)"]
        FEED["Homepage API"]
    end
    subgraph Engage["Engagement Plane"]
        CMT["Comment Service"]
        VCNT["View Count Service\n(Bigtable)"]
    end

    UP -->|"Resumable PUT (chunks)"| ULS
    ULS --> RAW
    RAW --> TQ
    TQ --> TC
    TC --> ENC
    TC --> META
    ENC --> L2
    L2 --> L1

    MAPI --> L1
    L1 -->|cache miss| L2
    L2 -->|cache miss| ORI
    ORI --> ENC

    FEED --> REC
    FEED --> SRCH
    CMT --> META
    UP -->|"publish event"| SRCH
    UP -->|"publish event"| REC

Write path (upload): Creator client uploads in 5 MB chunks via a resumable upload protocol to the Upload Service. Once all chunks land in raw object storage, a job message is pushed to the Transcode Queue. A Transcode Worker picks up the job, runs parallel FFmpeg processes for each output format, and writes encoded MPEG-DASH (Dynamic Adaptive Streaming over HTTP) and HLS (HTTP Live Streaming) segments back to object storage. The Metadata DB is updated; a publish event triggers Search indexing and Recommendation model updates.

Read path (streaming): The player fetches a manifest (MPD or M3U8) from the Manifest API, which returns pre-signed segment URLs pointing at the nearest Tier-1 edge cache (embedded at the ISP). Cache misses fall through to Tier-2 Google PoP caches, then to origin object storage. The ABR player selects the appropriate bitrate variant every few seconds based on measured throughput.

Component	Technology	Role
Upload Service	Custom gRPC (Google Remote Procedure Call) + GCS resumable API	Chunked ingest, deduplication, virus scan
Raw & Encoded Store	Google Cloud Storage (GCS)	Durable object storage, region-replicated
Transcode Queue	Google Pub/Sub	Durable job fan-out to transcode workers
Transcode Workers	FFmpeg on preemptible VMs / custom ASICs	Parallel encode: H.264, VP9, AV1 at 360p–4K
Video Metadata DB	Cloud Spanner	Globally consistent video metadata, channel info
View Count Store	Cloud Bigtable	High-write counter sharding, eventual consistency
Comment Store	Spanner + Bigtable	Threaded comments, votes, moderation state
CDN Tier-1 (Edge)	Google Edge Cache at ISP PoP	Last-mile segment delivery, 90%+ hit rate
CDN Tier-2 (Regional)	Google PoP (~180 global)	Region-level cache, reduces origin load
Search	Internal inverted index (Google Search infra)	Full-text search, autocomplete, query intent
Recommendation	Two-tower deep neural network (DNN)	Candidate retrieval + ranking at < 200 ms

5. Deep Dive
#

5.1 Upload Pipeline — Resumable Chunked Upload
#

Large uploads (multi-GB raw files) fail frequently on mobile. YouTube uses a resumable upload protocol: the client first POSTs metadata to obtain an upload session URI, then streams 5 MB chunks with byte-range headers. If the connection drops, the client queries the upload service for the last confirmed byte offset and resumes from there. Only when all chunks are acknowledged does the upload service write the assembled file to raw GCS and emit a VideoUploaded event.

// Simplified resumable upload session handler
public record UploadSession(
    String sessionId,
    String videoId,
    String rawGcsPath,
    long totalBytes,
    long confirmedBytes,
    Instant expiresAt
) {}

@PutMapping("/upload/{sessionId}")
public ResponseEntity<Void> uploadChunk(
        @PathVariable String sessionId,
        @RequestHeader("Content-Range") String contentRange,
        InputStream body) {

    UploadSession session = sessionStore.get(sessionId)
        .orElseThrow(() -> new SessionNotFoundException(sessionId));

    var range = ContentRange.parse(contentRange); // e.g. bytes 0-5242879/209715200
    if (range.start() != session.confirmedBytes()) {
        return ResponseEntity.status(308) // Resume Incomplete
            .header("Range", "bytes=0-" + (session.confirmedBytes() - 1))
            .build();
    }

    gcsClient.writeChunk(session.rawGcsPath(), range.start(), body, range.length());
    long updated = session.confirmedBytes() + range.length();
    sessionStore.updateConfirmed(sessionId, updated);

    if (updated >= session.totalBytes()) {
        pubSub.publish("video-uploaded", new VideoUploadedEvent(session.videoId()));
        return ResponseEntity.ok().build();
    }
    return ResponseEntity.status(308)
        .header("Range", "bytes=0-" + (updated - 1))
        .build();
}

5.2 Transcoding DAG
#

Each upload triggers a transcoding DAG (Directed Acyclic Graph) with these stages in parallel:

Demux & validate — container inspection (MP4/MOV/MKV), codec detection, duration, resolution.
Per-format encode — for each of ~10 output profiles (360p/AVC, 480p/AVC, 720p/VP9, 1080p/VP9, 1080p/AV1, 1440p/AV1, 2160p/AV1 + audio variants), FFmpeg runs on a dedicated preemptible VM.
Segment & package — output is split into 2-second DASH segments and 6-second HLS segments; manifests (MPD and M3U8) are generated.
Pre-warm CDN — encoded segments for the lowest bitrates (360p, 480p) are immediately pushed to Tier-2 regional caches so the video is playable before 4K encoding finishes.

YouTube’s internal transcoder was later extended with custom ASICs (Application-Specific Integrated Circuits) to encode AV1 — a compute-heavy codec — at 10× lower cost per video-minute than GPU-based FFmpeg.

5.3 Adaptive Bitrate Streaming
#

The DASH player measures available throughput every 2 seconds. It maintains a buffer goal (e.g., 30 seconds ahead). If the buffer is healthy, it requests a higher bitrate segment next; if bandwidth drops, it switches to a lower bitrate without stalling. This bitrate decision lives entirely on the client — the server just needs to serve segments fast.

The manifest API returns a signed URL per segment, valid for 1 hour, pointing at the nearest Tier-1 ISP cache. Segment files are immutable (content-addressed by video ID + format + timestamp), making CDN caching trivial.

5.4 View Count at Scale
#

Naive view count increments against a single counter row would saturate any database. YouTube shards the counter:

Write path: each view event is published to a Bigtable row keyed by videoId#shardId (100 shards per video). Writers round-robin across shards.
Read path (approximate): a background job periodically sums the 100 shard rows and writes the aggregate to a separate videoId#total row. Display queries read the aggregate; real-time shard reads are only done for fraud detection.
Write amplification cap: heavily viral videos may spike to millions of writes/s. Rate-limited batching at the app layer collapses nearby events into one Bigtable increment per 100 ms window per shard.

6. Data Model
#

Video Metadata (Cloud Spanner)
#

Column	Type	Notes
`video_id`	STRING(22)	Base64url random, PK
`channel_id`	STRING(22)	FK → channel table
`title`	STRING(200)	Full-text indexed externally
`description`	STRING(5000)
`status`	ENUM	`UPLOADING / PROCESSING / LIVE / DELETED`
`duration_s`	INT64	Seconds
`formats_ready`	ARRAY	e.g. `["360p","720p"]`, updated per transcode job
`thumbnail_url`	STRING(500)	GCS URL
`published_at`	TIMESTAMP	Index for feed freshness
`view_count_approx`	INT64	Updated by batch aggregation
`tags`	ARRAY	Propagated to search index

Indexes: (channel_id, published_at DESC) for channel feed; published_at DESC for trending; status for ops dashboards.

Comment Thread (Spanner)
#

Column	Type	Notes
`comment_id`	STRING(22)	PK
`video_id`	STRING(22)	FK, interleaved in video parent table
`parent_comment_id`	STRING(22)	NULL for top-level
`author_id`	STRING(22)
`body`	STRING(10000)
`like_count`	INT64
`created_at`	TIMESTAMP
`moderation_state`	ENUM	`PENDING / APPROVED / REMOVED`

Interleaving on video_id means Spanner co-locates all comments for a video on the same split, making “load top comments for video” a single-range scan.

View Count (Bigtable)
#

Row key	Column family	Qualifier	Value
`videoId#shardN`	`cnt`	`v`	INT64 delta
`videoId#total`	`cnt`	`v`	INT64 aggregate

7. Trade-offs
#

7.1 DASH vs HLS Delivery
#

	DASH	HLS
Standard	ISO MPEG-DASH (open)	Apple proprietary
DRM (Digital Rights Management)	Widevine / PlayReady natively	FairPlay (Apple), SAMPLE-AES
Browser support	Chrome, Firefox, Edge (via MSE)	Safari native, others via JS
Segment size	2 s (lower latency)	6 s (safer for CDN caching)

Conclusion: YouTube serves both formats. DASH for Android/Chrome/desktop; HLS for iOS/Safari. The transcoder produces both from the same encoded segments.

7.2 AV1 vs VP9 vs H.264
#

Codec	Compression gain over H.264	Encode cost	Decode support
H.264 / AVC	— (baseline)	1×	Universal
VP9	~30% better	5×	Chrome, Android, smart TVs
AV1	~50% better	20× (SW) / 2× (ASIC)	Chrome, Android 10+, some TVs

Conclusion: YouTube encodes all three. H.264 is the safe fallback; VP9 is the default for desktop/Android; AV1 is rolled out where decode support exists and saves ~50% bandwidth (which at 2 Pbps is enormous). The custom AV1 ASIC investment was justified by the CDN bandwidth savings alone.

7.3 Push vs Pull Pre-warm on CDN
#

Strategy	Latency after publish	Origin load	Storage waste
Push (pre-warm all formats)	Near-zero	Low	High — most videos get < 100 views
Pull (lazy on first request)	First viewer pays cache-fill latency	Spiky	None
Hybrid (push lowest bitrates only)	Near-zero for 360p/480p	Moderate	Low

Conclusion: YouTube pre-warms only the two lowest bitrate variants immediately after encode. Higher bitrates are pulled on demand and cached at Tier-2. The vast majority of views happen on popular videos that will fill Tier-1 quickly via pull.

7.4 Eventual vs Strong Consistency for View Counts
#

Strong consistency on a global counter requires distributed transactions — latency in the hundreds of milliseconds. YouTube tolerates approximate counts (±0.5%) in exchange for < 5 ms write latency. The exact count is reconciled nightly for monetisation payouts where accuracy matters.

8. Failure Modes
#

Component	Failure	Impact	Mitigation
Transcode Worker	VM preempted mid-job	Video stuck in PROCESSING	Pub/Sub nack → retry with exponential backoff; dead-letter queue (DLQ) after 5 attempts; ops alert on DLQ depth
CDN Tier-1 node	ISP cache node goes down	Viewers in that ISP fall back to Tier-2; latency spike	Anycast routing automatically fails over to nearest healthy Tier-2 PoP; no creator action needed
View Count Bigtable	Hot partition on viral video	Write latency spike, potential throttling	100-shard key design caps per-shard write rate; Bigtable auto-splits hot tablets
Recommendation Service	Model serving latency spike	Homepage shows stale/generic recs	Circuit breaker returns pre-computed top-N popular videos as fallback within 50 ms SLA
Metadata Spanner	Regional outage	Publish/update writes fail; reads degrade	Spanner multi-region config; reads from any replica; writes wait for quorum (< 100 ms added latency in normal operation)
Upload Service	Thundering herd — viral event causes upload spike	Upload latency, transcode queue depth grows	Pub/Sub backpressure; transcode fleet autoscales on queue depth metric; priority queue for monetised channels

9. Security & Compliance
#

AuthN/AuthZ (Authentication/Authorisation):

Upload requires OAuth 2.0 bearer token (Google Identity); creator’s channel_id is embedded in the session and validated on every chunk.
Private and unlisted videos: manifest API checks ACL (Access Control List) before returning segment URLs; signed URLs expire in 1 hour.

Encryption:

In transit: TLS (Transport Layer Security) 1.3 for all client-facing connections; internal GCS traffic uses Google-managed encryption.
At rest: GCS server-side AES (Advanced Encryption Standard)-256 encryption; CMEK (Customer-Managed Encryption Keys) available for enterprise.

Content Safety:

Every upload is scanned by the Content Safety API (CSAM hash matching, known-bad signature DB) before the VideoUploaded event fires.
A machine learning classifier runs asynchronously and may suppress public visibility pending human review.

Input Validation:

File type validation (magic bytes, not just extension) before raw write to GCS.
Metadata fields (title, description, tags) are HTML-escaped and length-capped server-side.
Rate limiting per channel_id: 50 uploads/hour, enforced at the Upload Service via Redis sliding window.

GDPR (General Data Protection Regulation) / Privacy:

Right-to-erasure: deleting a channel triggers an async job that removes video data from GCS, CDN cache purge, and tombstones the Spanner row. Comment bodies are replaced with [deleted] within 30 days.
PII (Personally Identifiable Information) in comments is subject to NLP-based scanning; comments may be redacted under GDPR Article 17 requests.

Audit Trail:

All upload, publish, and deletion events are appended to an immutable audit log (BigQuery streaming insert with insertId deduplication), retained 7 years for compliance.

10. Observability
#

RED Metrics (Rate, Errors, Duration):

Signal	Metric	Alert Threshold
Upload Rate	`upload_sessions_started / min`	Drop > 20% from baseline → PagerDuty
Transcode Error Rate	`transcode_jobs_failed / total`	> 0.5% → P2 alert
Transcode Duration (p95)	`transcode_duration_p95_s`	> 600 s for 1080p → P2
Playback Start Rate	`playback_success / total_attempts`	< 99% → P1 alert
CDN Hit Rate	`cache_hits / total_segment_requests`	< 90% → investigate origin load
Recommendation Latency (p99)	`rec_api_latency_p99_ms`	> 250 ms → scale recommendation pods

Saturation Metrics:

Transcode queue depth per priority tier (target: < 1,000 pending jobs)
Bigtable tablet CPU utilisation (target: < 70%)
CDN Tier-1 egress utilisation per ISP node (target: < 80%)

Business Metrics:

Creator upload-to-live latency (p50 / p95 / p99) — creator satisfaction KPI (Key Performance Indicator)
Viewer rebuffering ratio per region per device type
Recommendation CTR (Click-Through Rate) and watch time per served impression

Tracing:

Distributed trace spans (OpenTelemetry) propagated through Upload → Pub/Sub → Transcode Worker → CDN Pre-warm chain. Sampled at 1% in production; 100% for failed jobs.
Jaeger / Cloud Trace stores 15-day retention; critical paths instrumented with structured logging correlated by trace_id.

11. Scaling Path
#

Phase 1 — MVP (< 1K uploads/day, < 100K views/day)

Single upload service, FFmpeg on a few VMs, PostgreSQL for metadata, single-region object storage.
No CDN — serve video directly from object storage signed URLs.
What breaks first: FFmpeg queue depth grows linearly with upload rate; manual scaling needed above ~50 concurrent uploads.

Phase 2 — 10K uploads/day, 10M views/day

Introduce Pub/Sub for transcode job fan-out; autoscale transcode fleet on queue depth.
Add a CDN in front of object storage to absorb read traffic; cache hit rate reduces origin egress 80%+.
Migrate metadata to Spanner for global consistency and scale.
What breaks first: view counts on popular videos hammer a single row → introduce Bigtable shard counter.

Phase 3 — 100K uploads/day, 100M views/day

Two-tier CDN (ISP PoP caches + regional PoP); pre-warm popular/trending videos.
Introduce VP9 encoding; AV1 R&D begins.
Recommendation model replaces heuristic popularity-based ranking; candidate retrieval separated from ranking for latency.
What breaks first: search index freshness degrades under ingestion load → dedicate search indexing pipeline with 5-min SLO.

Phase 4 — 500+ hours/min uploads, 1B+ daily views (current scale)

Custom AV1 ASICs for encoding at 2× cost efficiency vs GPU.
Tens of thousands of ISP PoP nodes globally; Anycast routing for lowest RTT (Round-Trip Time).
Recommendation model: multi-task DNN (Deep Neural Network) trained on watch time, click-through, satisfaction survey signals.
Structured concurrency for upload session management (Java 21 virtual threads handle millions of simultaneous upload sessions cheaply).
What breaks first: global metadata consistency latency at Spanner becomes visible for creator dashboards → introduce read caching layer with 30-second TTL (Time-To-Live) for analytics queries.

12. Enterprise Considerations
#

Brownfield Integration: An enterprise deploying a private YouTube-like platform (internal training videos, compliance recordings) would integrate with existing IAM (Identity and Access Management) providers (Okta, Azure AD) via SAML (Security Assertion Markup Language) 2.0 or OIDC (OpenID Connect). The video metadata store needs to feed existing search (Elasticsearch) and DMS (Document Management Systems) for compliance discovery.

Build vs Buy:

Component	Build	Buy
Transcoding	FFmpeg (open-source) + custom orchestration	AWS Elemental MediaConvert, Mux
CDN	Internal ISP PoP nodes (YouTube scale only)	Cloudflare, Fastly, Akamai
Object Storage	GCS / S3 (effectively buy)	—
Recommendation	Build (core IP)	AWS Personalize, Vertex AI
Search	Build for YouTube scale	Elasticsearch for < 100M videos

Multi-Tenancy:

Namespace all storage paths by tenantId/videoId for data isolation.
Separate transcode queues per tenant tier (premium channels get priority queue).
Rate limits and storage quotas enforced per channelId at the API gateway.

TCO (Total Cost of Ownership) Ballpark: At YouTube’s scale, CDN egress dominates cost. Moving from VP9 to AV1 saves ~$500M/year in CDN bandwidth cost at 2 Pbps egress and $0.01/GB rates (order-of-magnitude estimate). Custom ASIC investment of ~$50M amortised over 5 years pays back in < 3 months of bandwidth savings.

Conway’s Law: YouTube’s separate teams for Upload, Transcode, CDN, Recommendation, and Search directly mirror the service decomposition. The upload → transcode → CDN pre-warm boundary is an organisational boundary as much as an architectural one.

13. Interview Tips
#

Start with the upload pipeline: Most interviewers care more about “how does video get processed” than the CDN. Walk through chunked upload → Pub/Sub → transcoding DAG → segment storage → CDN pre-warm in the first 10 minutes.
Clarify encoding requirements early: Ask “do we need multi-resolution support?” — this drives the transcoding DAG complexity. If yes, explain parallel per-format encoding as a fan-out pattern.
Call out the view count hot partition problem: This is a canonical high-write counter problem. Mention counter sharding (N shards per video, periodic aggregation) before the interviewer brings it up.
Recommendation system depth: If asked to go deep, describe the two-tower architecture — a user embedding tower and a video embedding tower trained jointly, ANN (Approximate Nearest Neighbours) retrieval for candidate generation, then a ranking model. Mention that retrieval and ranking are split for latency budget reasons.
Common mistake: Designing the streaming path to return raw GCS URLs from the metadata DB. The Manifest API should return CDN-prefixed, signed, short-TTL URLs so clients never bypass the cache layer.
Fluency vocabulary: Use “DASH manifest”, “ABR switching”, “transcode DAG”, “CDN pre-warm”, “counter shard”, “DLQ (Dead-Letter Queue)”, “ANN retrieval”, “two-tower DNN”.

14. Further Reading
#

Paper: Covington et al., Deep Neural Networks for YouTube Recommendations (RecSys 2016) — the canonical two-tower retrieval + ranking paper, still the foundation of modern video recommendations.
Engineering Blog: YouTube Engineering — AV1 at Scale — covers the ASIC investment and encoding cost savings.
RFC: RFC 8216 — HTTP Live Streaming (HLS) specification, covering segment format, manifest structure, and encryption.

Netflix — Video Streaming Platform

27 April 2026·17 mins

System Design Classic System-Design Classic

1. Hook # At peak, Netflix accounts for 15% of global internet downstream traffic — roughly 700 Gbps flowing to subscribers in 190 countries. What makes this feasible is not raw bandwidth: it is a carefully engineered pipeline that converts every raw title into over 1,200 encoded video files before a single subscriber presses play, then serves those files from ISP-embedded appliances called Open Connect Appliances (OCA) rather than from a traditional cloud CDN. The streaming experience you see — where the picture quality silently improves while you watch — is ABR (Adaptive Bitrate) streaming dynamically switching between those pre-encoded variants based on your network conditions. Behind the personalised rows on the homepage sits a recommendation engine that runs 45+ algorithms to surface the title you are most likely to start watching in the next 30 seconds. Each of these subsystems operates at a scale where a 0.1% drop in streaming reliability translates to 250,000 subscribers unable to watch at that moment.

Instagram

24 April 2026·30 mins

System Design Classic System-Design Classic

1. Hook # Instagram processes 100 million photo and video uploads every day, serves 4.2 billion likes, and delivers personalised feeds to 500 million daily users — all while keeping image loads under 200ms anywhere in the world. The engineering challenge is three-layered: a media processing pipeline that converts every raw upload into five optimised variants before the first follower ever sees it; a hybrid fan-out feed that handles both 400-follower personal accounts and 300-million-follower celebrities without write amplification blowing up; and an Explore page that must surface genuinely relevant content from a corpus of 50 billion posts to users who have never explicitly stated what they want. Each layer has a distinct bottleneck, and solving one often creates pressure on the others.

1. Hook #

2. Problem Statement #

Functional Requirements #

Non-Functional Requirements #

Out of Scope #

3. Scale Estimation #

4. High-Level Design #

5. Deep Dive #

5.1 Upload Pipeline — Resumable Chunked Upload #

5.2 Transcoding DAG #

5.3 Adaptive Bitrate Streaming #

5.4 View Count at Scale #

6. Data Model #

Video Metadata (Cloud Spanner) #

Comment Thread (Spanner) #

View Count (Bigtable) #

7. Trade-offs #

7.1 DASH vs HLS Delivery #

7.2 AV1 vs VP9 vs H.264 #

7.3 Push vs Pull Pre-warm on CDN #

7.4 Eventual vs Strong Consistency for View Counts #

8. Failure Modes #

9. Security & Compliance #

10. Observability #

11. Scaling Path #

12. Enterprise Considerations #

13. Interview Tips #

14. Further Reading #

Related