Netflix — Video Streaming Platform

Table of Contents

1. Hook
#

At peak, Netflix accounts for 15% of global internet downstream traffic — roughly 700 Gbps flowing to subscribers in 190 countries. What makes this feasible is not raw bandwidth: it is a carefully engineered pipeline that converts every raw title into over 1,200 encoded video files before a single subscriber presses play, then serves those files from ISP-embedded appliances called Open Connect Appliances (OCA) rather than from a traditional cloud CDN. The streaming experience you see — where the picture quality silently improves while you watch — is ABR (Adaptive Bitrate) streaming dynamically switching between those pre-encoded variants based on your network conditions. Behind the personalised rows on the homepage sits a recommendation engine that runs 45+ algorithms to surface the title you are most likely to start watching in the next 30 seconds. Each of these subsystems operates at a scale where a 0.1% drop in streaming reliability translates to 250,000 subscribers unable to watch at that moment.

2. Problem Statement
#

Functional Requirements
#

Users can browse a personalised catalogue and play any licensed title.
Video playback must start within 2 seconds and maintain seamless quality at varying bandwidths (ABR).
Users can resume playback from the last-watched position across devices.
Personalised homepage rows (Continue Watching, Top Picks, Because You Watched…).
Search by title, genre, actor, or keyword.
Content ingestion: studios submit raw files; Netflix transcodes and distributes to the CDN before launch date.

Non-Functional Requirements
#

Attribute	Target
Playback start latency (p95)	< 2s
Rebuffering ratio	< 0.1% of playback time
Global availability	99.99% (< 53 min downtime/year)
CDN cache hit rate	> 95% of stream bytes
Recommendation freshness	< 24h after viewing
Scale	250M subscribers, ~100M concurrent streams at peak

Out of Scope
#

Live streaming (Netflix Live is a separate, simpler path)
Content licensing and rights management
Studio-side DRM (Digital Rights Management) key management
Billing and subscription management

3. Scale Estimation
#

Assumptions:

250M subscribers; ~40% active on any given evening peak (~100M concurrent).
Average stream bitrate: 4 Mbps (mix of HD and UHD content).
Average viewing session: 90 minutes.
Catalogue: 15,000 titles; each title encoded into ~1,200 files averaging 50 GB/file.
95% of bytes served from OCA edge; 5% origin fallback.

Metric	Calculation	Result
Peak concurrent streams	250M × 40%	~100M streams
Peak egress bandwidth	100M × 4 Mbps	~400 Tbps
OCA-served bandwidth	400 Tbps × 95%	~380 Tbps
Origin egress	400 Tbps × 5%	~20 Tbps
Catalogue storage (encoded)	15,000 × 1,200 × 50 GB	~900 PB
New title ingest/day	10 titles × 1,200 files × 50 GB	~600 TB/day transcode output
Playback event writes/s	100M streams × 1 event/30s	~3.3M writes/s
Recommendation API calls/s	250M × 5 opens/day / 86,400	~14,500 rec calls/s

The 400 Tbps peak egress is the dominant engineering constraint. No single cloud provider can serve this economically; hence Netflix operates its own CDN embedded inside ISP networks.

4. High-Level Design
#

The system decomposes into three independent planes: the content ingestion plane (raw upload → transcode → CDN pre-position), the streaming plane (manifest generation → OCA selection → ABR playback), and the discovery plane (personalisation, search, and browse).

flowchart TD
    subgraph Studio["Studio / Content Partner"]
        RAW["Raw Mezzanine File\n(ProRes / IMF)"]
    end
    subgraph Ingest["Content Ingestion Plane"]
        CIP["Content Ingestion\nService (Hollow)"]
        CHUNKER["Media Chunker\n2-second segments"]
        ENCODER["Distributed Encoder\nAWS + Spot Fleet"]
        S3O["S3 Origin Store\n900 PB encoded files"]
        PREPOS["Pre-Position Service\nOCA Seeder"]
    end
    subgraph OCA["Open Connect CDN"]
        OCA1["OCA Appliance\nISP-embedded NFS"]
        OCA2["OCA Cluster\nRegional PoP"]
    end
    subgraph API["API Layer"]
        GW["API Gateway\nZuul — Auth · Rate · Route"]
        PLAY["Playback Service\nManifest · License · OCA URL"]
        REC["Recommendation\nService (Meson)"]
        SRCH["Search Service\nElasticsearch"]
        RES["Resume Point Service"]
    end
    subgraph Client["Client"]
        APP["Netflix App\nABR Player (ExoPlayer / Custom)"]
    end
    subgraph Data["Data Layer"]
        CASS["Cassandra\nViewing history · Resume points"]
        EVT["Kafka\nPlayback events stream"]
        FLINK["Flink\nReal-time viewing aggregation"]
        EVE["EV Cache (memcached)\nRec rows · Homepage"]
    end

    RAW --> CIP
    CIP --> CHUNKER
    CHUNKER --> ENCODER
    ENCODER --> S3O
    S3O --> PREPOS
    PREPOS -->|"push popular titles"| OCA1
    PREPOS -->|"push popular titles"| OCA2

    APP -->|"1 — GET /api/manifest"| GW
    GW --> PLAY
    PLAY -->|"pick nearest OCA"| OCA2
    PLAY -->|"DRM license"| PLAY
    PLAY -->|"signed manifest URL"| APP
    APP -->|"2 — fetch segments direct"| OCA1
    APP -->|"3 — playback heartbeat"| GW
    GW --> RES
    RES --> CASS
    GW --> EVT
    EVT --> FLINK
    FLINK --> CASS

    APP -->|"GET /home"| GW
    GW --> REC
    REC -->|"cached rows"| EVE
    REC -->|"history"| CASS

Component Reference
#

Component	Technology	Responsibility
Content Ingestion Service	Java + Hollow (in-memory dataset)	Receive studio files, validate, chunk into 2-second GoP-aligned segments
Distributed Encoder	VMAF-optimised FFmpeg on AWS Spot	Produce 1,200+ renditions across codecs (H.264, H.265, AV1) and resolutions
Open Connect Appliance	Custom NFS servers, 100–280 TB SSD per unit	ISP-embedded edge cache serving 95%+ of stream bytes, avoiding public internet
Playback Service	Java microservice, Zuul gateway	Generate ABR manifest (DASH/HLS), select nearest OCA, issue DRM license token
ABR Player	ExoPlayer (Android), custom (iOS/TV)	Real-time bandwidth estimation; switches bitrate without rebuffering
Recommendation Service (Meson)	Python ML + Java serving, EV Cache	Run 45+ algorithms, merge, rank, and cache personalised homepage rows
EV Cache	Memcached clusters, multi-AZ replicated	Cache recommendation rows, metadata, and frequently-read user state
Viewing History (Cassandra)	Apache Cassandra, wide rows by user_id	Resume points, playback history, watch progress for 250M users
Flink Pipeline	Apache Flink on Kafka streams	Aggregate real-time viewing events for A/B metrics and recommendation freshness

5. Deep Dive
#

5.1 Content Encoding Pipeline
#

The raw studio file arrives as a ProRes or IMF (Interoperable Master Format) package. Netflix’s pipeline, called Archer, performs per-title encoding optimisation: instead of encoding every title at fixed bitrate ladders, it analyses scene complexity (using VMAF — Video Multimethod Assessment Fusion) and assigns each shot its optimal QP (Quantisation Parameter). A simple talking-head scene needs far fewer bits than a fast-action chase sequence to look identical to the human eye.

The chunker splits the mezzanine into 2-second GoP (Group of Pictures) aligned segments. These boundaries are intentional: the ABR player can switch renditions only at GoP boundaries without visual glitches. After chunking, thousands of AWS Spot instances encode each 2-second chunk independently, then segments are stitched back in order. This parallel encoding reduces a feature film from 24 hours of single-machine encoding to under 30 minutes.

Output: ~1,200 files per title (6+ codecs × 20+ bitrate rungs × audio variants × subtitle tracks), stored in S3. Popular titles are immediately pre-positioned to OCAs globally.

// Simplified segment assignment for parallel encoding
record EncodeJob(String titleId, int segmentIndex, Duration offset,
                 Codec codec, int targetBitrate) {}

public List<EncodeJob> explodeJobs(Title title, List<Codec> codecs,
                                   List<Integer> bitrateRung) {
    var jobs = new ArrayList<EncodeJob>();
    for (var segment : title.segments()) {          // 2-second chunks
        for (var codec : codecs) {
            for (var bitrate : bitrateRung) {
                jobs.add(new EncodeJob(
                    title.id(), segment.index(),
                    segment.offset(), codec, bitrate));
            }
        }
    }
    return jobs;  // dispatched to Spot fleet via SQS
}

5.2 Open Connect CDN
#

Most CDNs sit in hyperscaler datacentres and traffic traverses the public internet to reach subscribers. Netflix instead installs OCA appliances inside ISP and IXP (Internet Exchange Point) racks under a free-of-charge agreement: ISPs get to offload upstream transit costs; Netflix pays only hardware.

Each OCA holds 100–280 TB of SSD content. A pre-positioning algorithm (the OCA seeder) runs nightly: it predicts what each OCA’s subscriber base will watch tomorrow based on historical demand, pushes that content proactively via an internal backbone, so that when a subscriber presses play, the segment is already local.

OCA selection by the Playback Service:

Identify subscriber’s ISP and geography.
Query the OCA steering service for the top-3 candidate appliances by latency (via BGP-level proximity).
Issue a manifest whose segment URLs point to that OCA.
If the OCA misses (< 5% of the time), fall back to S3 origin.

5.3 ABR (Adaptive Bitrate) Streaming
#

The manifest served to the player is a DASH (Dynamic Adaptive Streaming over HTTP) or HLS (HTTP Live Streaming) MPD file listing all renditions and their segment URLs. The player’s ABR algorithm (BOLA — Buffer Occupancy based Lyapunov Algorithm) makes a switching decision every 2 seconds:

If buffer > 20 seconds: step up to next bitrate rung.
If buffer < 10 seconds: step down immediately.
Bandwidth estimate is an exponential moving average of the last 5 segment download speeds.

The result: a user on a fluctuating LTE connection never sees a rebuffering spinner — they silently watch at 720p instead of 4K until the network recovers.

5.4 Recommendation Engine
#

Netflix’s Meson orchestration layer runs 45+ individual recommendation algorithms in parallel: collaborative filtering, content-based similarity, trending-in-country, continue-watching, because-you-watched, and more. Each produces a ranked list of titles. Meson merges these lists into homepage rows using a stacking policy (diversity constraints prevent the same genre appearing three rows in a row).

Candidate generation uses a two-tower neural network: a user tower encodes a subscriber’s viewing history and demographic features into a 256-dimensional embedding; a title tower encodes content features. ANN (Approximate Nearest Neighbour) search over the title tower retrieves the top-500 candidate titles in milliseconds.

Re-ranking applies a GBM (Gradient Boosted Machine) that considers context (time of day, device, session length) and short-term signals (what you watched last night) to produce the final ordered list cached in EV Cache for 30 minutes.

6. Data Model
#

6.1 Viewing History & Resume Points (Cassandra)
#

Table: viewing_history
Partition key: user_id (UUID)
Clustering key: viewed_at DESC

Columns:
  title_id       UUID
  profile_id     UUID           -- Netflix supports multiple profiles per account
  resume_offset  DURATION       -- seconds from start
  watched_pct    FLOAT          -- 0.0–1.0
  device_type    TEXT
  viewed_at      TIMESTAMP

Wide rows by user_id allow fast LIMIT 500 scans for a user’s recent history without cross-partition joins. TTL (Time-To-Live) of 2 years keeps storage bounded.

6.2 Content Metadata (MySQL + Hollow)
#

Canonical metadata (title, genre, cast, ratings) lives in MySQL (ACID for rights and metadata updates). Netflix Hollow replicates a snapshot to an in-memory read-only dataset on every JVM in the fleet, so metadata reads are sub-microsecond with zero network hops.

6.3 Playback Event Stream (Kafka)
#

Field	Type	Notes
`user_id`	UUID	Partitioned by user_id for ordering
`session_id`	UUID	Groups events for one viewing session
`title_id`	UUID
`event_type`	ENUM	`start`, `pause`, `seek`, `quality_change`, `stop`
`playback_position`	DURATION	Current position in stream
`rendition_bitrate`	INT	kbps, for QoE monitoring
`rebuffer_count`	INT	Rebuffering events since last heartbeat
`ts`	TIMESTAMP	Client-side wall clock

Kafka topic playback-events has 1,024 partitions (scales to ~3M events/s). Flink jobs consume this stream to update Cassandra resume points and feed real-time A/B experiment metrics.

7. Trade-offs
#

7.1 Push (Pre-position) vs Pull (On-demand fetch) for CDN
#

Option	Pros	Cons	When
Pre-position (Push)	Zero cache-miss latency, predictable ISP traffic	Storage cost at edge, wasted storage for unpopular titles	Netflix: popular catalogue
On-demand (Pull)	No wasted storage, always fresh	First-viewer pays origin latency; unpredictable burst	Long-tail content, live events
Hybrid	Optimises for popularity distribution	Complexity in prediction model	Netflix’s actual approach

Decision: Netflix pre-positions the top ~20% of titles that account for ~80% of streams; the long tail is served on-demand from S3 origin through OCA pull.

7.2 DASH vs HLS
#

Attribute	DASH	HLS
Segment format	fMP4 (fragmented MP4)	MPEG-TS or fMP4
DRM support	Widevine, PlayReady natively	FairPlay (Apple only)
Latency	~6-10s live; near-zero for VOD	Similar
Adoption	Android, Smart TVs, Web	iOS, macOS, Safari required

Decision: Netflix serves DASH on most devices (better codec flexibility and DRM) and HLS on Apple devices (platform requirement).

7.3 Monolithic vs Microservice Encoding Pipeline
#

Early Netflix encoded on monolithic on-premise servers. Moving to a distributed Spot fleet reduced encoding cost by 60% (Spot is 70-90% cheaper than On-Demand) at the cost of fault-tolerance complexity: any Spot instance can be reclaimed with 2 minutes notice, so each segment is encoded idempotently and checkpointed to S3. If an instance is reclaimed, the job is retried on a different instance with no data loss.

7.4 CAP: Availability over Consistency for Viewing History
#

Cassandra’s tunable consistency allows Netflix to choose QUORUM for critical writes (resume point) and ONE for reads. If a Cassandra node is unavailable, the player starts from the beginning rather than returning an error — an availability-over-consistency choice that errs on the side of a working product.

8. Failure Modes
#

Component	Failure	Impact	Mitigation
OCA appliance	Hardware failure	Subscribers in that ISP see playback start failures	Playback Service fails over to secondary OCA; origin fallback via S3
Playback Service	All instances unhealthy	No new streams can start	Multi-region active-active; Hystrix (now Resilience4j) circuit breaker
Cassandra cluster	Network partition	Resume point read fails	Return `offset=0` (start from beginning) — availability > consistency
Kafka consumer lag	Flink falls behind	Resume points stale by minutes	Lag alerting at 30s; Flink auto-scales consumer group; DLQ (Dead Letter Queue) for malformed events
Recommendation service	Cold start / model crash	Homepage shows stale or generic rows	EV Cache serves last-good cached rows for up to 1 hour; fallback to globally-popular titles
Thundering herd	New popular title released	Millions simultaneously request OCA for uncached segments	Pre-position runs 24h before release; jitter added to manifest TTL to spread OCA fetch
Hot partition in Cassandra	Celebrity account (shared profile_id)	Single partition overwhelmed	Profile-level sharding; write-behind with Kafka buffer
S3 origin slow	CDN miss path degraded	Long start times for long-tail content	S3 Transfer Acceleration; multi-region S3 replication

9. Security & Compliance
#

Authentication & Authorisation: Users authenticate via OAuth2 tokens; the API Gateway validates tokens using a shared JWK (JSON Web Key) set. Device-level tokens are rotated on each session. Profile-level access within an account uses a separate scoped token.

DRM: Netflix uses a multi-DRM approach — Widevine (Google) for Android/Chrome/Smart TVs, PlayReady (Microsoft) for Windows/Xbox, and FairPlay (Apple) for iOS/macOS. The Playback Service issues a short-lived (4-hour TTL) license token per session; the player cannot cache the decryption key beyond that window.

Encryption in Transit: All client traffic uses TLS 1.3 (Transport Layer Security). OCA-to-origin traffic uses mTLS (mutual TLS) with certificate pinning.

Encryption at Rest: S3 objects encrypted with AES-256 (Advanced Encryption Standard) using per-title KMS (Key Management Service) keys. Cassandra at-rest encryption using TDE (Transparent Data Encryption).

Input Validation: The Playback Service validates all manifest requests: title must be licensed for subscriber’s country; profile must belong to the account; device fingerprint must match the issued token.

PII (Personally Identifiable Information) / GDPR: Viewing history is subject to GDPR right-to-erasure. Netflix implements crypto-shredding: each user’s Cassandra data is encrypted with a per-user DEK (Data Encryption Key) stored in Vault; erasure deletes the DEK, making historical rows unreadable without needing to delete Cassandra rows.

Content Security: Watermarking embeds a per-subscriber invisible forensic watermark in video frames at encode time, enabling piracy tracing.

Audit Logging: All Playback Service decisions (OCA selected, DRM license issued) are written to an immutable audit log in S3 for SOC 2 (System and Organisation Controls 2) compliance.

Rate Limiting: API Gateway enforces per-account rate limits on manifest requests (max 50/minute) to prevent credential-sharing automation.

10. Observability
#

RED Metrics (Rate, Errors, Duration)
#

Service	Rate	Error	Duration
Playback Service	Manifest requests/s	5xx rate	p99 manifest latency
OCA	Bytes served/s	Cache miss rate	Segment fetch latency (p95)
Recommendation	Homepage loads/s	Rec service error rate	Row generation latency
Cassandra	Reads + writes/s	Timeout rate	p99 read/write latency

Streaming Quality (QoE — Quality of Experience)
#

Metric	Alert Threshold	Why
Rebuffering ratio	> 0.1% of playback time	Leading indicator of churn
Video start failure rate	> 0.5% of play attempts	Upstream playback issues
Startup latency (p95)	> 2s	Direct UX impact
Bitrate switches down	> 2 per session avg	Network or OCA instability

Business Metrics
#

Stream completions (> 90% viewed) — content quality proxy
Engagement per session — recommendation effectiveness
OCA cache hit ratio — cost efficiency signal

Tracing
#

Netflix uses Edgar (an internal OpenTelemetry-based system) to trace the full playback path: client → Zuul → Playback Service → OCA selection → segment fetch. Tail-based sampling at 1% of sessions, 100% on sessions with rebuffering events.

Alerting
#

PagerDuty on-call for rebuffering ratio spike (3-minute rolling average > threshold).
Automated OCA health probes every 30 seconds; unhealthy OCA removed from steering table within 90 seconds.

11. Scaling Path
#

Phase 1 — MVP (< 1,000 concurrent streams)
#

Single-region. Nginx/FFmpeg on EC2, videos in S3, PostgreSQL for user state. A monolithic Java API serves everything. Manual encoding jobs. No CDN — serve directly from S3 with CloudFront as a simple cache. What breaks first: S3 egress cost and latency as concurrent streams grow.

Phase 2 — Regional Scale (1K → 100K concurrent streams)
#

Decompose into Playback, Recommendation, and User microservices. Add CloudFront CDN. Replace PostgreSQL with Cassandra for user history (write amplification kills RDBMS at this scale). Begin automated distributed encoding on Spot. Add Kafka for playback event streaming. What breaks first: CloudFront CDN cache hit rate drops below 80% for long-tail content; costs spike.

Phase 3 — National Scale (100K → 10M concurrent streams)
#

Deploy Open Connect Appliances in the top-20 ISP partners. Add EV Cache (memcached) layer for recommendation rows. Implement Hollow for metadata. Build the pre-positioning system. Shard Cassandra to 3 regional clusters. What breaks first: OCA coverage gaps; subscribers without an OCA-partnered ISP see high origin egress.

Phase 4 — Global Scale (10M → 100M+ concurrent streams)
#

Full global OCA rollout (1,000+ appliances). Multi-region active-active API layer. Per-title VMAF-optimised encoding. AV1 codec for 30% bandwidth reduction. A/B framework drives every algorithm decision. Chaos Engineering (Chaos Monkey, Chaos Kong for region failures) is standard practice. What breaks now: encoding pipeline throughput for growing catalogue; recommendation model freshness at this data volume.

12. Enterprise Considerations
#

Brownfield Integration: Enterprises migrating legacy VOD (Video-on-Demand) platforms to this architecture should start with the playback service and CDN layer first — these deliver the biggest latency and cost wins — before investing in the recommendation engine (highest ML complexity).

Build vs Buy:

Component	Build	Buy
ABR encoding	Netflix built Archer + VMAF (title-level optimisation is unique)	FFmpeg is open-source; AWS Elemental for managed encoding
CDN	Netflix built OCA (ISP partnerships justify it at 400 Tbps)	CloudFront / Fastly / Akamai — correct choice for < 10 Tbps
Recommendation	Build two-tower + GBM on your own data	AWS Personalize, Google Recommendations AI
DRM	Integrate Widevine + FairPlay SDKs	AWS Elemental MediaConvert for DRM packaging

Multi-Tenancy: A multi-tenant streaming platform (e.g., an OTT (Over-The-Top) SaaS provider serving multiple media companies) must isolate encoding pipelines per tenant, enforce per-tenant content security policies, and prevent cross-tenant DRM key exposure. Separate Cassandra keyspaces per tenant; shared Kafka with per-tenant topic prefixes.

Vendor Lock-in: The OCA appliance strategy creates hardware lock-in (custom NFS hardware). For a smaller operator, Fastly or Cloudflare Stream avoids lock-in at higher per-GB cost. Netflix’s 400 Tbps justifies the capex; at 10 Tbps the economics flip.

TCO (Total Cost of Ownership) Ballpark: At 10 Tbps peak egress, CloudFront costs ~$0.0085/GB ≈ $8.5M/month; OCA amortised hardware + hosting ≈ $1.5M/month at this scale. OCA breaks even at roughly 5 Tbps sustained.

Conway’s Law: Netflix’s org mirrors the architecture — separate teams own encoding, CDN, playback, and recommendations. The API contract between Playback Service and OCA is the boundary where team autonomy ends. Any shared-nothing boundary reduces coordination tax.

13. Interview Tips
#

Clarify “streaming” scope early. VOD (Video on Demand) and live streaming are architecturally distinct. Confirm which one before drawing any diagram. VOD = pre-encoded assets + CDN; live = ingest → transcode → ultra-low-latency delivery path.
Drive toward the CDN. Interviewers expect you to articulate why serving from origin is untenable at Netflix scale and what a CDN buys you. Name OCA or at least the concept of ISP-embedded edge nodes.
Show the encoding pipeline. Candidates who skip directly to “store in S3 and stream” miss the dominant complexity: you must pre-encode 1,200 renditions before a single play is possible. Explain why GoP-aligned 2-second segments are required for ABR switching.
Name ABR explicitly. “The player dynamically switches quality” is not enough. Name the acronym ABR, explain that it requires the manifest to list all renditions, and that the player makes a switching decision every segment.
Common mistake: Designing a single-bitrate streaming system. Any senior interviewer will immediately ask “what happens when the user’s network degrades?” — the answer is ABR, not “it buffers.”
Fluency vocabulary: GoP (Group of Pictures), VMAF (Video Multimethod Assessment Fusion), ABR (Adaptive Bitrate), DASH (Dynamic Adaptive Streaming over HTTP), HLS (HTTP Live Streaming), OCA (Open Connect Appliance), DRM (Digital Rights Management), QoE (Quality of Experience), EV Cache, Hollow, fMP4 (fragmented MP4), OTT (Over-The-Top).

14. Further Reading
#

Netflix Tech Blog — “Optimizing the Netflix Streaming Experience with Data Science” — covers the QoE metrics and rebuffering model.
DASH Industry Forum — DASH-IF Interoperability Guidelines — canonical specification for the manifest format and segment addressing.
“Bola: Near-Optimal Bitrate Adaptation for Online Videos” (SIGCOMM 2016) — the algorithm behind Netflix’s ABR player switching logic.
Netflix Tech Blog — “Open Connect Everywhere” — details the OCA deployment model and ISP partnership economics.
“A Survey of Adaptive Video Streaming with Reinforcement Learning” (IEEE 2020) — covers RL-based ABR approaches that Netflix and others have experimented with.

YouTube — Video Upload, Transcoding & Global Delivery

27 April 2026·16 mins

System Design Classic System-Design Classic

1. Hook # Every minute, creators upload 500 hours of video to YouTube — roughly 720,000 hours of raw footage per day that must be validated, transcoded into 10+ adaptive formats, and made globally available before viewers ever click play. Unlike Netflix (a closed catalogue of licensed titles transcoded offline), YouTube is a live upload platform: a creator in Lagos hits “publish” and expects global playback within minutes. The upload pipeline, transcoding infrastructure, and two-tier CDN (Content Delivery Network) that make this possible are among the most complex media-engineering systems on the planet. On the consumption side, 2 billion+ logged-in users watch over 1 billion hours of video daily — a recommendation challenge that dwarfs most advertising systems in latency sensitivity and business impact. If the recommendation model serves the wrong video, engagement drops; if the transcoder stalls, creators lose monetisation time.

Instagram

24 April 2026·30 mins

System Design Classic System-Design Classic

1. Hook # Instagram processes 100 million photo and video uploads every day, serves 4.2 billion likes, and delivers personalised feeds to 500 million daily users — all while keeping image loads under 200ms anywhere in the world. The engineering challenge is three-layered: a media processing pipeline that converts every raw upload into five optimised variants before the first follower ever sees it; a hybrid fan-out feed that handles both 400-follower personal accounts and 300-million-follower celebrities without write amplification blowing up; and an Explore page that must surface genuinely relevant content from a corpus of 50 billion posts to users who have never explicitly stated what they want. Each layer has a distinct bottleneck, and solving one often creates pressure on the others.

1. Hook #

2. Problem Statement #

Functional Requirements #

Non-Functional Requirements #

Out of Scope #

3. Scale Estimation #

4. High-Level Design #

Component Reference #

5. Deep Dive #

5.1 Content Encoding Pipeline #

5.2 Open Connect CDN #

5.3 ABR (Adaptive Bitrate) Streaming #

5.4 Recommendation Engine #

6. Data Model #

6.1 Viewing History & Resume Points (Cassandra) #

6.2 Content Metadata (MySQL + Hollow) #

6.3 Playback Event Stream (Kafka) #

7. Trade-offs #

7.1 Push (Pre-position) vs Pull (On-demand fetch) for CDN #

7.2 DASH vs HLS #

7.3 Monolithic vs Microservice Encoding Pipeline #

7.4 CAP: Availability over Consistency for Viewing History #

8. Failure Modes #

9. Security & Compliance #

10. Observability #

RED Metrics (Rate, Errors, Duration) #

Streaming Quality (QoE — Quality of Experience) #

Business Metrics #

Tracing #

Alerting #

11. Scaling Path #

Phase 1 — MVP (< 1,000 concurrent streams) #

Phase 2 — Regional Scale (1K → 100K concurrent streams) #

Phase 3 — National Scale (100K → 10M concurrent streams) #

Phase 4 — Global Scale (10M → 100M+ concurrent streams) #

12. Enterprise Considerations #

13. Interview Tips #

14. Further Reading #

Related