Skip to main content
  1. System designs - 100+/
  2. Classic/

Twitter / Social Media Feed

Lakshay Jawa
Author
Lakshay Jawa
Sharing knowledge on system design, Java, Spring, and software engineering best practices.
Table of Contents

1. Hook
#

Twitter at peak serves 600K tweet reads per second while simultaneously processing tens of thousands of new tweets. The naive approach — querying who you follow, then fetching all their tweets, then sorting — collapses instantly at scale. The real architecture is a masterclass in the write-amplification vs read-latency trade-off, and the edge cases (Lady Gaga following Justin Bieber, or vice versa) reveal why no single strategy wins.


2. Problem Statement
#

Functional Requirements
#

  1. Users can post tweets (text, images, videos).
  2. Users can follow/unfollow other users.
  3. A user’s home timeline shows tweets from people they follow, reverse-chronologically.
  4. Users can like, retweet, and reply to tweets.
  5. Search for tweets by keyword or hashtag.

Non-Functional Requirements
#

Attribute Target
Timeline read latency (p99) < 300ms
Write availability 99.95%
Read availability 99.99%
Timeline freshness < 5s from tweet creation
Scale 300M DAU, 500M tweets/day

Out of Scope
#

  • Direct messages
  • Trending topics algorithm
  • Ad insertion
  • Abuse/spam detection

3. Scale Estimation
#

Assumptions:

  • 300M DAU; each user reads timeline ~10×/day, posts ~0.5 tweet/day.
  • Average user follows ~200 accounts; average follower count ~200.
  • Celebrity accounts: up to 100M followers (e.g., Katy Perry).
  • Tweet object: ~1KB (text + metadata). Media stored separately on object storage.
Metric Calculation Result
Tweet writes 300M × 0.5 / 86,400 ~1,750 writes/s
Tweet reads (timeline) 300M × 10 / 86,400 ~34,700 reads/s
Fan-out events (write) 1,750 × 200 avg followers ~350,000 writes/s
Storage per day 500M tweets × 1KB ~500 GB/day
Storage per year 500GB × 365 ~180 TB/year
Timeline cache size 300M users × 800 tweets × 8B (tweet ID) ~192 GB active users
Bandwidth (reads) 34,700 × 1KB ~34 MB/s

Fan-out is the dominant load: 350K cache writes/second at peak. This is the central problem the entire architecture is designed around.


4. High-Level Design
#

The system has two completely separate paths that are optimised independently: a write path that fans out new tweets to follower timelines, and a read path that assembles and serves a pre-built timeline in under 300ms.

flowchart TD
    subgraph CL["Client Layer"]
        MOB["Mobile / Web Client"]
    end
    subgraph AL["API Layer"]
        GW["API Gateway\nAuth · Rate Limit · Routing"]
    end
    subgraph WP["Write Path (async fan-out)"]
        TW["Tweet Service\nValidate · Store · Publish"]
        KF["Kafka\ntopic: tweet.created"]
        FO["Fan-out Service\nWorker Pool"]
    end
    subgraph RP["Read Path (pre-built timelines)"]
        TL["Timeline Service\nFetch · Merge · Respond"]
        HY["Hydration Service\nBatch-fetch tweet objects"]
    end
    subgraph SL["Storage Layer"]
        CAS["Cassandra\nTweet Store (canonical)"]
        RD["Redis Cluster\nTimeline Cache\nSorted Sets per user"]
        SGS["Social Graph Service\nFollower Sets"]
        S3["S3 + CDN\nMedia Storage"]
    end
    MOB -->|"POST /tweet"| GW
    MOB -->|"GET /timeline"| GW
    GW -->|"write request"| TW
    GW -->|"read request"| TL
    TW -->|"1. persist tweet"| CAS
    TW -->|"2. publish event"| KF
    TW -->|"media upload URL"| S3
    KF -->|"consume events"| FO
    FO -->|"lookup followers"| SGS
    FO -->|"ZADD tweet_id\nnormal users only"| RD
    TL -->|"ZRANGE latest 800"| RD
    TL -->|"pull celebrity tweets\nauthor follower_count > 10K"| CAS
    TL -->|"merged tweet ID list"| HY
    HY -->|"batch MGET by tweet_id"| CAS

Component Reference
#

Component Technology Role Key Design Decision Failure Behaviour
API Gateway Nginx / Envoy Single entry point. Validates JWT, enforces rate limits, routes to Tweet Service or Timeline Service, terminates TLS. Rate limiting is enforced here — not inside microservices — so internal services never see unauthenticated or burst traffic. Horizontally scaled behind a hardware LB. No local state; stateless restart.
Tweet Service Java / Go microservice Validates tweet content (280 chars, media type), assigns a TIMEUUID tweet_id, writes to Cassandra synchronously, then publishes a TweetCreatedEvent to Kafka. Returns the tweet_id to the client immediately — fan-out is async. Kafka publish is fire-and-forget from the client's perspective. If Kafka is temporarily unavailable, the tweet is still persisted in Cassandra — a separate reconciliation job can backfill fan-out events from the WAL. Cassandra write failure → 503 to client (tweet not created). Kafka publish failure → tweet exists but fan-out delayed; reconciliation catches up within minutes.
Kafka (tweet.created) Apache Kafka (MSK) Durable, ordered, replayable event log. Decouples the Tweet Service (synchronous, user-facing) from the Fan-out Service (async, high-throughput). Partitioned by author_id so all tweets from one author go to the same partition, preserving order for follower timelines. Retention set to 30 days (covers GDPR deletion SLA). Consumer lag is the primary fan-out health signal — it tells you how stale timelines are getting. Broker failure → Kafka replication (RF=3) handles this transparently. Consumer group offset commits happen after successful Redis writes, so crashes replay without data loss.
Fan-out Service Java 21 (virtual threads) worker pool Consumes tweet.created events. For each event, fetches the author's follower list from Social Graph Service. For every follower (below the celebrity threshold of 10,000), writes the tweet_id into that follower's Redis sorted-set timeline using ZADD. Celebrities are skipped entirely — their tweets are pulled at read time. The 10,000-follower threshold is tunable. Each Redis write is a pipeline command — a single fan-out event for a 5,000-follower user is batched into one Redis pipeline call, not 5,000 individual round-trips. Worker crash → Kafka offset not committed → event replayed. Redis writes are idempotent (ZADD NX), so replay is safe. If a follower's Redis key doesn't exist, the write is a no-op; the timeline is reconstructed fresh on their next login.
Social Graph Service Redis Cluster + MySQL Maintains bidirectional follow relationships. Exposes two sets per user: followers:{userId} (who follows this user) and following:{userId} (who this user follows). Redis is the hot path for fan-out lookups. MySQL is the source of truth for follow counts, privacy settings, and audit. Follow/unfollow events are written to MySQL first (ACID), then asynchronously synced to Redis. This means there is a short window (typically <1s) where the Redis follower set is stale — acceptable for fan-out purposes. Redis miss → fall back to MySQL query. MySQL unavailable → follower lookups fail → fan-out backlog grows in Kafka; self-heals once MySQL recovers. Circuit breaker prevents cascading pressure on MySQL during Redis outage.
Timeline Cache (Redis) Redis Cluster (sorted sets) Stores each user's home timeline as a sorted set: timeline:{userId} → { tweet_id → unix_timestamp_ms }. Capped at 800 entries via ZREMRANGEBYRANK on every write. Stores only tweet IDs — not full tweet objects. Full objects are fetched separately by the Hydration Service. This keeps memory usage predictable. Sorted sets are chosen over lists because timeline merging (combining pre-built IDs with celebrity tweets) requires ordering by score (timestamp). A list would require O(N) re-sort on every celebrity merge. TTL of 7 days for inactive users — active users stay hot indefinitely. Node failure → Redis Cluster promotes replica within seconds. Short window of reads falling through to DB reconstruction (expensive: queries Social Graph + Cassandra per followed account). Request coalescing (singleflight pattern) prevents thundering herd on cache miss.
Timeline Service Java microservice Entry point for all timeline reads. Fetches up to 800 tweet IDs from the user's Redis sorted set. Separately queries Cassandra for recent tweets from any celebrity accounts the user follows (authors with follower_count > 10K). Merges and re-sorts both lists by timestamp. Passes the top-N tweet IDs to the Hydration Service. The number of celebrities a user follows is bounded by checking each following entry's follower count in Social Graph Service — expensive for users following thousands of accounts. Optimization: maintain a separate celebrity_following:{userId} set that is updated asynchronously on follow/unfollow. Redis miss (cold start) → fall back to full timeline reconstruction. This is a blocking, expensive path. Mitigation: pre-warm timeline async on user login event (Kafka user.login topic) before the HTTP response is returned.
Hydration Service Java microservice + local L1 cache Takes a list of tweet IDs and returns full tweet objects. Performs a batch MGET against a tweet object cache (Redis or Memcached) first. For IDs not found in cache, issues a parallel multi-get against Cassandra. Assembles and returns the full list. Also enriches each tweet with the author's display name and avatar (from a separate User Service). Popular tweets (viral content) will be fetched thousands of times per second. A local in-process L1 cache (Caffeine, max 10K entries, 30s TTL) catches these hot objects before they reach Redis or Cassandra. Write-through cache on tweet creation — the Hydration Service pre-warms its L1 when the Tweet Service publishes. Cassandra unavailable → return partial response with cached tweet objects only. Fail open with a degraded timeline rather than a 503 — users see fewer tweets, not an error page.
Tweet Store (Cassandra) Apache Cassandra 4.x Canonical store for all tweet content. Partitioned by author_id, clustered by tweet_id DESC (TIMEUUID). This means "get this author's last 20 tweets" is a single partition read — efficient for both profile pages and celebrity tweet pulls at read time. TTL on tweets: none (tweets are permanent unless deleted by the user or moderation). High-volume authors (news bots, major accounts) can create hot partitions. Mitigation: composite partition key (author_id, bucket) where bucket = YYYYMM. This distributes writes across time buckets while keeping reads efficient (query at most 1-2 buckets per celebrity pull). Node failure → Cassandra's RF=3 quorum (LOCAL_QUORUM) masks it. Read repair and hinted handoff handle short outages. Multi-DC replication for regional HA; writes use LOCAL_QUORUM, reads use LOCAL_ONE for latency.
Media Store (S3 + CDN) S3 + CloudFront / Cloudflare Images and videos are never stored inline in tweets. The Tweet Service generates a pre-signed S3 upload URL, returns it to the client, and the client uploads directly to S3. The media URL stored in Cassandra is the CDN URL — never a raw S3 URL. CDN edge nodes cache media aggressively (immutable cache-control headers, since media is content-addressed by hash). Media is immutable: once uploaded, it doesn't change. This allows infinite CDN TTL. CSAM scanning (PhotoDNA) runs asynchronously after upload — media is served from CDN but soft-deleted from tweet display if flagged within minutes. S3 outage → CDN serves cached media. Origin-shield layer (CloudFront regional edge cache) absorbs most traffic before hitting S3. New uploads fail gracefully: client retries with exponential backoff.

5. Deep Dive
#

Fan-out Service — The Heart of the System
#

The fan-out service is where the hardest engineering lives. Its job sounds simple: when a tweet is created, write its ID into every follower’s timeline. The problem is scale — a user with 1 million followers generates 1 million Redis writes from a single tweet. With 1,750 tweets per second average, and many of those from accounts with large followings, the fan-out service needs to sustain 350,000+ Redis writes per second with minimal lag.

The solution is parallelism at every level. Fan-out workers run as a pool of Java 21 virtual threads — each Kafka consumer thread can handle hundreds of concurrent Redis pipeline calls without blocking an OS thread. Each fan-out event’s Redis writes are batched into a single pipeline per Redis shard, not sent as individual commands. This collapses 5,000 round-trips into roughly 30 pipeline calls (one per Redis shard that owns those follower IDs).

The celebrity threshold (default: 10,000 followers) is the key architectural escape valve. When an author exceeds it, the fan-out service stops writing their tweets into follower timelines entirely. Instead, their tweets are pulled at read time by the Timeline Service. This trade-off is asymmetric: for a user with 50 million followers, pushing to all timelines would cost 50 million Redis writes per tweet. Pulling means at most one Cassandra query per follower’s timeline load — and only when they’re actively reading. For a celebrity who tweets 5 times a day to 50 million followers, the push cost is 250 million Redis writes/day. The pull cost (assuming 10% of followers actively read daily): 5 million Cassandra reads/day. Pull wins by 50×.

The fan-out is eventually consistent by design. A user with 500 followers will see a tweet appear in all their followers’ timelines within a few seconds under normal load. During Kafka consumer lag events (fan-out backlog > 100K events), freshness SLO degrades gracefully — timelines are stale but correct. The system never corrupts or loses a tweet; it just delivers it later.

// Fan-out worker — Java 21, virtual threads, pipeline batching
public class FanoutWorker implements Runnable {

    private final FollowerRepository followerRepo;
    private final TimelineCache timelineCache;
    private static final int CELEBRITY_THRESHOLD = 10_000;
    private static final int TIMELINE_CAP = 800;

    @Override
    public void run() {
        try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
            kafkaConsumer.poll(Duration.ofMillis(100)).forEach(record -> {
                TweetEvent event = record.value();
                List<Long> followers = followerRepo.getFollowers(event.authorId());

                if (followers.size() > CELEBRITY_THRESHOLD) {
                    return; // pull model — Timeline Service handles this at read time
                }

                // Batch writes per Redis shard via pipeline
                scope.fork(() -> {
                    timelineCache.pipelinedZAdd(followers, event.tweetId(),
                        event.createdAtMs(), TIMELINE_CAP);
                    return null;
                });
            });
            scope.join().throwIfFailed();
        } catch (Exception e) {
            // Don't commit Kafka offset — event will be replayed
            log.error("Fan-out failed, offset not committed", e);
        }
    }
}

Timeline Cache — Why Sorted Sets, Not Lists
#

Redis offers several data structures that could store a user’s timeline. The choice matters because it affects both memory cost and merge performance.

A Redis List (LPUSH/LRANGE) offers O(1) prepend and O(N) range reads. It’s the obvious choice for an ordered feed. But lists have a critical flaw: they can’t be efficiently merged with an externally ordered set. When the Timeline Service needs to combine the pre-built list (from Redis) with celebrity tweets (from Cassandra), it needs to merge two time-sorted sequences. With a list, you’d have to load all entries, merge in memory, and re-sort — O(N log N) per request.

A Redis Sorted Set (ZADD/ZRANGE) uses the tweet’s unix timestamp as the score. Adding a new tweet is O(log N). Fetching the latest 20 is ZRANGE timeline:{userId} -1 -21 BYSCORE REV — O(log N + M). The merge operation for celebrity tweets becomes a simple merge of two pre-sorted sequences (like merge-sort’s merge step), which is O(M) where M is the number of celebrity tweets to interleave. This is the decisive advantage.

The cap at 800 entries is enforced on every write:

ZADD timeline:{userId} {timestamp} {tweet_id}
ZREMRANGEBYRANK timeline:{userId} 0 -801   # trim to newest 800

The 800-entry cap is chosen to cover 2-3 scroll sessions (most users read 20-50 tweets per session) while keeping per-user memory bounded at 800 × 8 bytes = 6.4KB. For 300M active users, that’s ~1.9TB of sorted set data across the Redis cluster — manageable with memory-optimised instances.

Social Graph Service — The Fan-out Enabler
#

The Social Graph Service is often underestimated in system design discussions. It’s not just a “followers table.” At Twitter’s scale, it’s a multi-layer system:

Redis layer (hot path): Each user’s follower set is stored as a Redis Set: followers:{userId} → {Set of follower userIds}. The Fan-out Service calls SMEMBERS followers:{authorId} to get all follower IDs in O(1) amortised time. For a user with 10,000 followers, this returns 10,000 IDs in a single Redis command.

MySQL layer (source of truth): All follow/unfollow operations write to MySQL first (follows table with an index on followee_id). MySQL enforces referential integrity (can’t follow a deleted account) and accurate follow counts. After a MySQL commit, the Social Graph Service asynchronously updates Redis. The Redis set can be slightly stale (by seconds) — acceptable.

Why not a graph database? Neo4j or similar graph DBs excel at multi-hop traversals (“friends of friends”). We don’t need multi-hop — we only need one-hop fan-out (get all direct followers). Redis Sets give us exactly that in O(1) at 10× lower latency than a graph DB query.

One subtle problem: the unfollow edge case. When a user unfollows someone, the Fan-out Service must stop writing that author’s tweets into the unfollower’s timeline. The unfollow event triggers an async cleanup — the last N tweet IDs from that author are removed from the user’s Redis timeline. This cleanup can be delayed by minutes, meaning the user may briefly still see tweets from someone they just unfollowed. This is an intentional product trade-off, not a bug.

Read Path — Timeline Assembly in Under 300ms
#

The read path has a strict 300ms p99 SLA, which means every call in the chain needs to be fast and parallelised. Here’s the detailed flow with latency budget:

Step Operation Target Latency
1 Redis ZRANGE: fetch 800 tweet IDs ~2ms
2 Social Graph: identify celebrity following ~3ms (cached)
3 Cassandra: fetch celebrity recent tweets (parallel) ~10ms
4 Merge + sort tweet ID lists ~1ms (CPU)
5 Hydration Service: batch MGET tweet objects ~15ms (L1 hit) or ~40ms (Cassandra)
6 User Service: enrich with author display info ~5ms (cached)
Total (p50) ~36ms
Total (p99, Cassandra miss) ~200ms

Steps 2, 3, and 6 are all executed in parallel via CompletableFuture. The Hydration Service uses a singleflight pattern to prevent N concurrent requests for the same viral tweet ID each triggering a separate Cassandra read — the first request fetches and all others wait for its result.


6. Data Model
#

Tweet Table (Cassandra)
#

Column Type Notes
author_id BIGINT Partition key
bucket TEXT Partition key (YYYYMM) — prevents hot partitions for prolific authors
tweet_id TIMEUUID Clustering key DESC — enables reverse-chron scans without sorting
content TEXT Max 280 UTF-8 characters
media_urls LIST<TEXT> CDN URLs (never raw S3)
reply_to_id TIMEUUID NULL if top-level tweet
retweet_of_id TIMEUUID NULL if original
created_at TIMESTAMP Denormalised from tweet_id for display
is_deleted BOOLEAN Soft delete; content cleared on GDPR erasure

Partition strategy: Composite key (author_id, bucket) where bucket = YYYYMM. A prolific author tweeting 100 times/day generates 36,500 tweets/year — a single author_id partition would grow unboundedly and create a hotspot during celebrity read-time pulls. With monthly buckets, the Timeline Service queries at most 1-2 buckets per celebrity pull.

Why TIMEUUID as clustering key? TIMEUUID embeds the creation timestamp, so ORDER BY tweet_id DESC is equivalent to reverse-chronological without a separate sort. Cassandra stores clustering columns in sorted order on disk, making this a zero-cost O(N) scan.

Counter columns are separate tables: like_count and retweet_count are Cassandra COUNTER type, which must live in their own table (Cassandra limitation). Counter updates are commutative and idempotent — exactly what you need for concurrent like operations from millions of users.

Timeline Cache (Redis)
#

Key:   timeline:{userId}
Type:  Sorted Set
Score: unix timestamp (milliseconds)
Value: tweet_id (string representation of TIMEUUID)
Cap:   800 entries — enforced via ZREMRANGEBYRANK after every ZADD
TTL:   7 days for inactive users (EXPIRE set on cache miss + reconstruction)
       No TTL for active users (kept hot by continuous fan-out writes)

Follower Index (Redis + MySQL)
#

Redis (hot path):
  followers:{userId}  → SET  { followerId1, followerId2, ... }
  following:{userId}  → SET  { followeeId1, followeeId2, ... }
  celeb_following:{userId} → SET { celebId1, ... }  (subset where follower_count > 10K)

MySQL (source of truth):
  follows(
    follower_id  BIGINT NOT NULL,
    followee_id  BIGINT NOT NULL,
    created_at   TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY  (follower_id, followee_id),
    INDEX        idx_followee (followee_id, follower_id)  -- for follower list queries
  )

7. Trade-offs
#

Fan-out on Write vs Fan-out on Read
#

Approach Pros Cons When to Use
Fan-out on Write (push) O(1) reads; predictable latency; timeline is always ready Write amplification: 1 tweet → N Redis writes; celebrity accounts make this O(100M) Normal users with < 10K followers
Fan-out on Read (pull) No write amplification; tweet is always fresh; handles celebrities cleanly O(F) reads per timeline load where F = number followed; cold timelines are slow Celebrities (> 10K followers); inactive users
Hybrid (Twitter’s actual approach) Combines O(1) reads for common case with pull for edge cases Merge logic complexity; celebrity detection adds a join on every follow relationship Always — this is the production answer

Conclusion: Hybrid with a tunable celebrity threshold. The threshold is a knob: lower it (e.g., 1,000) to push more load to read time; raise it (e.g., 100,000) to fan-out more aggressively. Twitter reportedly used different thresholds for different scenarios.

Redis Sorted Set vs List for Timeline
#

Option Pros Cons
Sorted Set (ZADD/ZRANGE) O(log N) insert; natural time-ordered merge with celebrity tweets; range queries by score ~64 bytes per entry (vs ~24 for list); higher memory cost
List (LPUSH/LRANGE) O(1) prepend; lower memory per entry No dedup by score; celebrity merge requires full load + re-sort (O(N log N) per read)

Conclusion: Sorted Set. The merge operation for celebrity tweets is the deciding factor.

Cassandra vs MySQL for Tweets
#

Option Pros Cons
Cassandra Linear horizontal scale; high write throughput (LSM tree); native TTL; tunable consistency No joins; limited query patterns; no ACID; counter tables are separate
MySQL Full ACID; flexible queries; mature ecosystem; joins Vertical scale ceiling; sharding is complex; write throughput limited by B-tree rewriting

Conclusion: Cassandra for tweets (write-heavy, scale-critical, simple access patterns). MySQL for social graph (relational integrity on follow counts, privacy settings).

Kafka vs Direct Redis Write for Fan-out
#

Option Pros Cons
Kafka (async) Decouples tweet write latency from fan-out; replayable on worker crash; handles burst via buffering Adds fan-out latency (seconds); operational complexity
Direct Redis write (sync) Immediate timeline freshness; simpler architecture Tweet write latency includes N Redis writes; celebrity accounts cause multi-second tweet API response

Conclusion: Kafka. User-facing write latency (POST /tweet) must be fast and predictable. Async fan-out keeps tweet creation at ~20ms regardless of follower count.


8. Failure Modes
#

Component Failure Impact Mitigation
Fan-out Service Worker crash mid-fan-out Partial timeline update — some followers see tweet, others don’t Kafka consumer group; offset committed only after successful Redis pipeline write. Crashed worker’s batch replays. Idempotent ZADD NX prevents duplicates on replay.
Redis Timeline Cache Node failure Cache miss storm — Timeline Service falls back to DB reconstruction Redis Cluster with replicas (RF=2). Singleflight pattern on Timeline Service collapses concurrent misses for the same user into one reconstruction.
Social Graph Service (Redis) Redis unreachable Fan-out workers can’t fetch follower lists → Kafka backlog grows Circuit breaker opens; Fan-out Service falls back to MySQL follower query (slower, ~50ms vs ~2ms). Backlog self-heals once Redis recovers.
Celebrity tweet pull Celebrity with 100M followers tweets a viral thread Spike in Cassandra reads for every timeline load Cache celebrity’s recent tweets (last 20) in a dedicated Redis key with 30s TTL. Timeline Service reads from this cache before hitting Cassandra.
Cassandra hot partition High-volume author writes to same partition Write throughput degradation, increased tail latency Composite partition key (author_id, YYYYMM) distributes writes across time buckets.
Kafka consumer lag Fan-out workers slow (Redis degraded, high throughput burst) Timeline freshness SLA breached — tweets appear late Autoscale fan-out worker pool based on consumer lag metric. Alert at lag > 10K; page at > 100K.
Timeline reconstruction (cold start) Inactive user returns after 7 days (TTL expired) Expensive: query Social Graph + Cassandra for each followed account Cap reconstruction to last 200 tweets; async pre-warm triggered by user.login Kafka event. Serve partial timeline immediately, stream updates via SSE.
Thundering herd on celebrity tweet Millions of users load timeline within seconds of a viral tweet Hydration Service hammered for same tweet_id from millions of requests L1 in-process Caffeine cache (10K entries, 30s TTL) in Hydration Service absorbs hot IDs. Singleflight collapses parallel requests for the same ID.

9. Security & Compliance
#

AuthN/AuthZ: OAuth 2.0 with short-lived JWTs (15-min access token, 30-day refresh). API Gateway validates the JWT signature and injects userId into the request header downstream — internal services never re-validate the token. Tweets from private accounts are fan-outed only to approved followers: the fan-out worker checks the author’s privacy setting before writing to each follower’s timeline. Non-approved followers are skipped entirely.

Encryption: TLS 1.3 for all in-transit data (client to gateway, service to service via mutual TLS). At-rest encryption for Cassandra (AES-256) and Redis (encrypted EBS volumes on AWS). Media on S3 with SSE-S3 (server-side encryption). Encryption keys managed via AWS KMS with automatic 90-day rotation.

Input Validation: Tweet content sanitized server-side (XSS stripping, null byte removal). URLs expanded and checked against a malicious domain blocklist before storage. Media type validated against MIME type (not just file extension). Media scanned for CSAM using PhotoDNA before CDN upload — this is async, but the tweet is held from public display until scan completes (typically < 500ms).

Rate Limiting: Write: 300 tweets per 3 hours per user (token bucket in Redis, keyed by userId). Read: 1,500 timeline requests per 15 minutes per user (Twitter’s actual v2 API limit). These limits are enforced at the API Gateway, not inside microservices, using a shared Redis rate limit store. DDoS protection at edge via Cloudflare (Layer 3/4 scrubbing + Layer 7 WAF rules).

PII/GDPR: User deletion triggers async tombstone propagation — tweet content is cleared in Cassandra (is_deleted=true, content=null), then tweet IDs are removed from all follower timelines via a reverse fan-out using the Social Graph. Full deletion SLA: 30 days (Kafka retention must be ≥ 30 days to guarantee the deletion event reaches all consumers). Account deletion events are published to a dedicated user.deleted Kafka topic consumed by every service that holds user data.

Audit Log: All privileged actions (shadow banning, content removal, account suspension) written to an immutable append-only audit store: a dedicated Kafka topic with cleanup.policy=delete and infinite retention, archived nightly to S3 with object lock (WORM). This provides a tamper-evident record required for legal holds and SOC 2 compliance.


10. Observability
#

RED Metrics
#

Metric Alert Threshold
Timeline read latency p99 > 300ms → page
Timeline read latency p50 > 100ms → warn
Fan-out lag (Kafka consumer lag) > 10K events → warn; > 100K → page
Tweet write error rate > 0.1% → page
Timeline cache hit rate (Redis) < 90% → warn; < 80% → page
Hydration cache hit rate (L1) < 60% → warn

Saturation Metrics
#

  • Redis memory utilisation per shard: warn at 70% (before eviction), page at 85%
  • Kafka partition throughput: warn at 80% of broker capacity
  • Cassandra read latency p99 per node: warn at 50ms, page at 100ms
  • Fan-out worker thread pool saturation: warn at 80%

Business Metrics
#

  • Tweets per second (real-time, 1-min rolling window) — deviation > 30% from baseline triggers anomaly alert
  • Timeline impressions per second
  • Fan-out amplification ratio: (fan-out Redis writes / tweets created) — should track average follower count; sudden drop indicates fan-out workers falling behind

Tracing
#

OpenTelemetry with tail-based sampling (sample 10% of normal requests, 100% of requests where any span exceeds 200ms, 100% of errors). This ensures slow requests are always captured while keeping trace storage costs manageable.

Key trace spans: tweet.writekafka.publishfanout.dispatchredis.pipeline.write, and separately: timeline.readredis.zrangecelebrity.pullhydration.batch_getuser.enrich. Jaeger for trace storage; Grafana dashboards for RED metrics and fan-out lag.


11. Scaling Path
#

Phase 1 — MVP (< 1K RPS reads)
#

Single Postgres DB, no Redis, no fan-out. Timeline query: SELECT * FROM tweets WHERE author_id = ANY($1) ORDER BY created_at DESC LIMIT 20. Simple, correct, fast enough at low scale.

What breaks first: DB CPU at ~5K timeline reads/s. The author_id = ANY(following_list) query becomes a full index scan when a user follows 200+ accounts.

Phase 2 — 10K RPS reads
#

Migrate tweets to Cassandra. Add Redis for timeline caching. Implement basic fan-out (synchronous, in-process, called during the POST /tweet handler). Add read replicas for Social Graph MySQL.

What breaks first: Synchronous fan-out adds latency to tweet creation. A user with 10,000 followers causes a 10-second tweet API response. Celebrity accounts are unusable.

Phase 3 — 100K RPS reads
#

Async fan-out via Kafka. Introduce celebrity threshold (10K followers → pull model). Redis Cluster (sharded by userId mod N). Cassandra multi-DC replication. Introduce Hydration Service as a separate tier.

What breaks first: Redis memory. 300M users × 800 IDs × 8 bytes = ~192GB. Need aggressive LRU eviction of inactive user timelines (TTL = 7 days). Also: Social Graph Redis starts showing memory pressure from large follower sets.

Phase 4 — 1M+ RPS reads
#

Tiered timeline storage: hot (Redis, last 800 tweets), warm (Memcached, last 7 days), cold (DynamoDB, full history). Geo-distributed fan-out workers per region (fan-out happens in the region where the tweet was created, replicates to follower regions via cross-region Kafka mirroring). Edge caching of celebrity tweet objects via CDN. Predictive pre-warm: pre-build timelines for users in push-notification cohorts before they open the app.


12. Enterprise Considerations
#

Brownfield Integration: If migrating from a monolith, use the Strangler Fig pattern — route /timeline to the new service while legacy handles everything else. Timeline data can be bootstrapped from existing MySQL with a one-time backfill job that reads the follows table and reconstructs Redis sorted sets per user. This backfill runs at low priority during off-peak hours to avoid impacting production MySQL.

Build vs Buy:

  • Social Graph: Build (custom follow semantics, privacy rules, celebrity detection logic are too domain-specific for off-the-shelf)
  • Message Queue: Kafka (Confluent Cloud or MSK for managed) — do not build
  • Cache: Redis Cluster (ElastiCache on AWS, or self-managed for cost) — not Memcached (no sorted sets, no cluster-mode atomic operations)
  • Object Storage: S3 / GCS — never build this
  • CDN: Cloudflare or CloudFront for media; Fastly for API edge caching of public timelines
  • Search: Elasticsearch / OpenSearch — feed tweets async via Kafka consumer

Multi-Tenancy: Not applicable for a Twitter-style consumer product. For a B2B social platform (Slack feeds, enterprise activity streams), isolate by workspace at the Redis key (timeline:{workspaceId}:{userId}) and Cassandra partition level. Fan-out workers use separate Kafka consumer groups per workspace tier to ensure one noisy tenant doesn’t delay another’s fan-out.

TCO Ballpark (100K RPS reads):

Component Config Est. Monthly Cost
Redis Cluster (timeline cache) 3× r7g.4xlarge (122GB RAM each) ~$6,000
Cassandra cluster 6× i4i.4xlarge (NVMe SSD) ~$12,000
Kafka (MSK) 3 brokers, m5.2xlarge ~$3,000
Fan-out workers 20× c7g.2xlarge ~$4,000
Social Graph Redis 2× r7g.2xlarge ~$2,000
Total (compute) ~$27,000/mo

CDN and data transfer costs add 20-40% on top. At 1M+ RPS, egress becomes the dominant cost line.

Conway’s Law: Fan-out Service, Social Graph Service, Timeline Service, and Hydration Service should each be owned by separate teams. The fan-out amplification ratio (fan-out writes / tweets created) is a shared KPI owned by all four teams — it’s the single number that best describes system health and requires coordination to improve.


13. Interview Tips
#

  • Always ask the celebrity problem upfront. “What’s the max follower count we need to handle?” This unlocks the hybrid fan-out discussion and signals you know the edge case. Without this, you’ll design a pure push model and get interrupted.
  • Fan-out on write is the obvious first answer; immediately challenge it yourself. Mention write amplification for high-follower accounts, then pivot to the hybrid model. Interviewers want to see you reason through trade-offs, not recite a pattern without friction.
  • Timeline freshness is a hidden non-functional requirement. Ask: “Is eventual consistency acceptable for timeline delivery? How stale can a timeline be?” This opens the discussion of Kafka consumer lag as a freshness SLA.
  • Don’t forget cold start. What happens when an inactive user opens the app after 6 months? Their Redis TTL has expired. Timeline reconstruction from DB is expensive. Discuss async pre-warm on the user.login Kafka event.
  • Vocabulary that signals fluency: fan-out amplification ratio, write-behind cache, sorted set capping, celebrity threshold, timeline hydration, social graph denormalisation, singleflight pattern, tail-based sampling.

14. Further Reading
#

  • Twitter Engineering Blog — “Storing data at massive scale”: Twitter’s original Gizzard sharding framework and why they moved to Manhattan (internal distributed DB).
  • “Feeding Frenzy: Selectively Materializing Users’ Event Feeds” (SIGMOD 2010) — The canonical academic treatment of social feed materialisation strategies. Directly describes the push/pull hybrid model.
  • Martin Kleppmann, “Designing Data-Intensive Applications” — Chapter 11 covers stream processing for feed systems with Kafka; Chapter 5 covers replication lag in cached timelines.
  • RFC 7519 — JWT spec; relevant for the OAuth 2.0 token lifecycle in the security section.