nSkillHub
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

System Design: URL Shortener

A URL shortener is a classic system design question. It seems simple — but the interviewer is using it to probe your decisions on hashing, database design, caching, and scaling reads. Here’s the complete design.


Requirements

Functional:

  • Given a long URL, generate a short code (e.g., bit.ly/abc123)
  • Given a short code, redirect to the original URL
  • Custom slugs (user-defined: bit.ly/my-company)
  • Analytics: click counts, unique visitors, referrer, geo
  • Link expiration

Non-functional:

  • Redirects must be fast (p99 < 50ms)
  • Write throughput: ~1,000 URL creations/day (light write)
  • Read throughput: ~1M redirects/day (very read-heavy, ~12 reads/sec average, spikes higher)
  • Short codes must be unique and collision-resistant
  • URLs must not be guessable (prevent enumeration of other users’ links)

Scale reasoning: Read:write ratio is ~1000:1. This is a read-dominated system. The core optimization target is redirect latency.


Key Decisions

Short Code Generation

Option 1: Hash-based (MD5, SHA-256 + truncate)

Hash the long URL, take the first 7 characters of the base62-encoded hash.

MD5("https://example.com/long-path?param=value") → "d41d8cd98f00b204..."
Base62(d41d8cd9...) → "dnh75Zs"

Problem: Collisions. Different URLs can hash to the same prefix. Must check DB for collision and re-hash or append salt.

Problem 2: Same URL from different users gets the same short code. May or may not be desired.

Option 2: Sequence-based (Counter + Base62 encoding)

Use a global counter (database sequence or Redis INCR). Encode the integer in base62.

Counter: 12345678
Base62(12345678) = "VR7g2" (7 chars)

Pros: Guaranteed unique (as long as counter doesn’t overflow), no collision check needed. Cons: Sequential codes are guessable (abc123, abc124…). Must obfuscate if privacy matters. Cons: Counter becomes a bottleneck at extreme scale — solvable with range allocation (each server claims a range of IDs).

Option 3: Random + uniqueness check

Generate 7 random base62 characters. Check if it exists. Retry if collision. At low occupancy, collision probability is negligible.

P(collision) with 7 chars, 62^7 ≈ 3.5 trillion possible codes, 1M links stored
≈ 1M / 3.5T ≈ 0.00003% per generation attempt

Recommended: Random generation with collision check for small-to-medium scale. Counter-based for high-write-volume services where you need guaranteed uniqueness without retries.

Custom Slugs

Custom slugs (bit.ly/my-company) are stored separately. On write: check if slug already exists (exact match). Reject duplicates. Store custom slugs in the same table with a flag, or in a separate table if you want different TTL/analytics behavior.

Redirect Type: 301 vs 302

  • 301 (Permanent): Browser caches the redirect. Subsequent clicks skip your server entirely. Great for performance; terrible for analytics (you don’t see the click).
  • 302 (Temporary): Browser does NOT cache. Every click goes through your server. You see every redirect; latency is slightly higher.

Decision: Use 302 if analytics matter (you want to count every click). Use 301 only if you’ll never change the destination and you want maximum performance with no analytics.


Architecture

                         ┌─────────────────┐
Write path:              │   API Service   │
Client → API Gateway → ──┤  (creates URL)  ├──→ Postgres (primary store)
                         └────────┬────────┘         │
                                  │              Async write
                                  └──────────────→ Analytics DB

                         ┌─────────────────┐
Read path (redirect):    │  Redirect Svc   │
Client → CDN/LB ────→ ──┤  (302 redirect) ├──→ Redis Cache
                         └────────┬────────┘         │ miss
                                  │              ↓
                                  └──────────── Postgres read replica

Components:

API Service: Handles URL creation, custom slug validation, user authentication. Writes to Postgres primary.

Redirect Service: The hot path. Must be as fast as possible. Check Redis first; fall back to Postgres replica on miss; return 302.

Redis Cache: LRU cache of short_code → long_url. TTL set to match link expiration or a default (24 hours). Expected hit rate > 95% for popular links.

Postgres: Primary store. Schema:

CREATE TABLE links (
    id           BIGSERIAL PRIMARY KEY,
    short_code   VARCHAR(20) UNIQUE NOT NULL,
    long_url     TEXT NOT NULL,
    user_id      BIGINT,
    created_at   TIMESTAMPTZ DEFAULT NOW(),
    expires_at   TIMESTAMPTZ,
    is_custom    BOOLEAN DEFAULT false
);
CREATE INDEX ON links(short_code);  -- the hot lookup

Analytics pipeline: Click events (short_code, timestamp, user_agent, ip, referrer) written asynchronously to a queue (Kafka or SQS), consumed by an analytics service that writes to a columnar store (BigQuery, ClickHouse) for aggregation.


Handling Scale

Read path optimization:

  • CDN caching: Cache the 302 response at the CDN edge. Popular links served from edge — zero DB/cache load. Set Cache-Control: max-age=3600 with a Vary: None or use purge-on-update.
  • Redis as primary cache for non-CDN paths
  • Read replicas for cache misses

Write path:

  • URL creation is low-throughput — Postgres primary handles it easily
  • Collision check (for random codes) requires one DB read per creation — acceptable

Hot links (celebrity / viral): A single link receiving 10M requests in an hour. Solutions:

  • CDN caching handles this — redirect cached at edge
  • If CDN isn’t caching: Redis with in-memory L1 cache in the redirect service for the top-N links

Failure Modes

  • Redis unavailable: Redirect service falls back to DB. Latency increases but availability preserved. Add alerting on elevated DB hit rate.
  • Postgres primary down: URL creation fails (acceptable — write path). Redirect service continues from replica + cache.
  • Code collision on create: Retry with new random code. Max 3 retries — if still colliding, return error (extremely rare).
  • Expired links: Cleanup job runs nightly to delete/archive expired entries. In the redirect path, check expires_at and return 410 Gone.

EM Talking Points

  • Why not just use a UUID for the short code? UUIDs are 36 characters — longer than the original URL path in many cases. Base62-encoded sequential IDs or 7-char random codes are much shorter.
  • Why 302 and not 301? If analytics are a product requirement (click tracking, A/B testing), 301 breaks it. Use 302 by default.
  • How do you prevent abuse? Rate limiting on URL creation per user/IP. URL safety check (scan against known malware/phishing lists — Google Safe Browsing API). Reject known bad domains.
  • Multi-region: If you need global low-latency redirects, deploy the redirect service in multiple regions, replicating the cache and DB reads globally. CDN makes this less urgent.