Design a URL shortener in Python — architecture, storage, and scale to 1B URLs
py-sys-002
Your answer
Answer as you would in a real interview — explain your thinking, not just the conclusion.
Model answer
Requirements: 1B URLs, 50k redirects/sec, 5k writes/sec, analytics. Short code: 8-char Base62 = 218 trillion possible codes — more than enough. ID generation: Twitter Snowflake or a distributed counter backed by Redis INCR (use atomic GETSET + INCR to get a unique sequence ID per DC). Storage: Cassandra for the shortcode→longURL mapping (write-optimised, automatically partitions by shortcode hash, handles 1B rows). Redis cluster for hot URL cache — LFU eviction, 10GB RAM covers top 5M URLs. Redirect tier: FastAPI (uvicorn + httptools) on multiple instances behind Nginx. Each instance has a local LRU cache (functools.lru_cache or cachetools.TTLCache) for the hottest 10k URLs. Analytics: async fan-out — on redirect, publish to a Redis stream (XADD); a separate consumer group of Python workers (redis-py streams API) aggregates click counts into ClickHouse. CDN (Cloudflare) in front — cache 301 redirects at the edge for top 1% of URLs.
Code example
from fastapi import FastAPI, HTTPException
from fastapi.responses import RedirectResponse
from cachetools import TTLCache
import asyncio
app = FastAPI()
_local_cache: TTLCache = TTLCache(maxsize=10_000, ttl=300)
@app.get("/{code}")
async def redirect(code: str):
if url := _local_cache.get(code):
asyncio.create_task(_record_click(code)) # fire-and-forget
return RedirectResponse(url, status_code=301)
url = await _redis.get(code) # Redis tier
if url is None:
url = await _cassandra.get(code) # DB tier
if url is None:
raise HTTPException(status_code=404)
await _redis.setex(code, 86400, url)
_local_cache[code] = url
asyncio.create_task(_record_click(code))
return RedirectResponse(url, status_code=301)
Follow-up
How would you implement custom expiry — delete a short URL after 30 days — at Cassandra scale?