System Design: URL Shortener
A URL shortener is a classic system design question. It seems simple — but the interviewer is using it to probe your decisions on hashing, database design, caching, and scaling reads. Here’s the complete design.
Functional:
- Given a long URL, generate a short code (e.g.,
bit.ly/abc123) - Given a short code, redirect to the original URL
- Custom slugs (user-defined:
bit.ly/my-company) - Analytics: click counts, unique visitors, referrer, geo
- Link expiration
Non-functional:
- Redirects must be fast (p99 < 50ms)
- Write throughput: ~1,000 URL creations/day (light write)
- Read throughput: ~1M redirects/day (very read-heavy, ~12 reads/sec average, spikes higher)
- Short codes must be unique and collision-resistant
- URLs must not be guessable (prevent enumeration of other users’ links)
Scale reasoning: Read:write ratio is ~1000:1. This is a read-dominated system. The core optimization target is redirect latency.
Option 1: Hash-based (MD5, SHA-256 + truncate)
Hash the long URL, take the first 7 characters of the base62-encoded hash.
MD5("https://example.com/long-path?param=value") → "d41d8cd98f00b204..."
Base62(d41d8cd9...) → "dnh75Zs"
Problem: Collisions. Different URLs can hash to the same prefix. Must check DB for collision and re-hash or append salt.
Problem 2: Same URL from different users gets the same short code. May or may not be desired.
Option 2: Sequence-based (Counter + Base62 encoding)
Use a global counter (database sequence or Redis INCR). Encode the integer in base62.
Counter: 12345678
Base62(12345678) = "VR7g2" (7 chars)
Pros: Guaranteed unique (as long as counter doesn’t overflow), no collision check needed. Cons: Sequential codes are guessable (abc123, abc124…). Must obfuscate if privacy matters. Cons: Counter becomes a bottleneck at extreme scale — solvable with range allocation (each server claims a range of IDs).
Option 3: Random + uniqueness check
Generate 7 random base62 characters. Check if it exists. Retry if collision. At low occupancy, collision probability is negligible.
P(collision) with 7 chars, 62^7 ≈ 3.5 trillion possible codes, 1M links stored
≈ 1M / 3.5T ≈ 0.00003% per generation attempt
Recommended: Random generation with collision check for small-to-medium scale. Counter-based for high-write-volume services where you need guaranteed uniqueness without retries.
Custom slugs (bit.ly/my-company) are stored separately. On write: check if slug already exists (exact match). Reject duplicates. Store custom slugs in the same table with a flag, or in a separate table if you want different TTL/analytics behavior.
- 301 (Permanent): Browser caches the redirect. Subsequent clicks skip your server entirely. Great for performance; terrible for analytics (you don’t see the click).
- 302 (Temporary): Browser does NOT cache. Every click goes through your server. You see every redirect; latency is slightly higher.
Decision: Use 302 if analytics matter (you want to count every click). Use 301 only if you’ll never change the destination and you want maximum performance with no analytics.
┌─────────────────┐
Write path: │ API Service │
Client → API Gateway → ──┤ (creates URL) ├──→ Postgres (primary store)
└────────┬────────┘ │
│ Async write
└──────────────→ Analytics DB
┌─────────────────┐
Read path (redirect): │ Redirect Svc │
Client → CDN/LB ────→ ──┤ (302 redirect) ├──→ Redis Cache
└────────┬────────┘ │ miss
│ ↓
└──────────── Postgres read replica
Components:
API Service: Handles URL creation, custom slug validation, user authentication. Writes to Postgres primary.
Redirect Service: The hot path. Must be as fast as possible. Check Redis first; fall back to Postgres replica on miss; return 302.
Redis Cache: LRU cache of short_code → long_url. TTL set to match link expiration or a default (24 hours). Expected hit rate > 95% for popular links.
Postgres: Primary store. Schema:
CREATE TABLE links (
id BIGSERIAL PRIMARY KEY,
short_code VARCHAR(20) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMPTZ DEFAULT NOW(),
expires_at TIMESTAMPTZ,
is_custom BOOLEAN DEFAULT false
);
CREATE INDEX ON links(short_code); -- the hot lookup
Analytics pipeline: Click events (short_code, timestamp, user_agent, ip, referrer) written asynchronously to a queue (Kafka or SQS), consumed by an analytics service that writes to a columnar store (BigQuery, ClickHouse) for aggregation.
Read path optimization:
- CDN caching: Cache the 302 response at the CDN edge. Popular links served from edge — zero DB/cache load. Set
Cache-Control: max-age=3600with aVary: Noneor use purge-on-update. - Redis as primary cache for non-CDN paths
- Read replicas for cache misses
Write path:
- URL creation is low-throughput — Postgres primary handles it easily
- Collision check (for random codes) requires one DB read per creation — acceptable
Hot links (celebrity / viral): A single link receiving 10M requests in an hour. Solutions:
- CDN caching handles this — redirect cached at edge
- If CDN isn’t caching: Redis with in-memory L1 cache in the redirect service for the top-N links
- Redis unavailable: Redirect service falls back to DB. Latency increases but availability preserved. Add alerting on elevated DB hit rate.
- Postgres primary down: URL creation fails (acceptable — write path). Redirect service continues from replica + cache.
- Code collision on create: Retry with new random code. Max 3 retries — if still colliding, return error (extremely rare).
- Expired links: Cleanup job runs nightly to delete/archive expired entries. In the redirect path, check
expires_atand return 410 Gone.
- Why not just use a UUID for the short code? UUIDs are 36 characters — longer than the original URL path in many cases. Base62-encoded sequential IDs or 7-char random codes are much shorter.
- Why 302 and not 301? If analytics are a product requirement (click tracking, A/B testing), 301 breaks it. Use 302 by default.
- How do you prevent abuse? Rate limiting on URL creation per user/IP. URL safety check (scan against known malware/phishing lists — Google Safe Browsing API). Reject known bad domains.
- Multi-region: If you need global low-latency redirects, deploy the redirect service in multiple regions, replicating the cache and DB reads globally. CDN makes this less urgent.