Python Mid-level (2–5 yrs) SystemDesign Frequent Python system design Kafka Redis rate limiting FastAPI

Design a Python service for processing real-time user activity events with rate limiting

py-mid-006

Your answer

Answer as you would in a real interview — explain your thinking, not just the conclusion.

Model answer

Requirements: ingest user activity events (clicks, page views) at up to 50k/sec, apply per-user rate limiting, store aggregated stats. Architecture: FastAPI ingest endpoint → validates and publishes to Kafka (producer batching with linger.ms=5, batch.size=65536). Consumer group of workers reads from Kafka, applies a sliding-window rate limiter using Redis ZSET (store timestamps, ZRANGEBYSCORE to count within window, ZADD + ZREMRANGEBYSCORE in a Lua script for atomicity). Aggregate stats (events per user per hour) written to TimescaleDB. For Python-side buffering: use asyncio producer with an internal deque and flush every 100 events or 50ms, whichever comes first. Key decisions: Kafka provides durable buffering against consumer downtime; Redis rate limiting avoids double-counting across replicas; async I/O prevents thread-per-request blocking.

Code example

import redis.asyncio as aioredis
import time

async def is_rate_limited(
    r: aioredis.Redis,
    user_id: str,
    max_events: int = 100,
    window_seconds: int = 60,
) -> bool:
    key = f"ratelimit:{user_id}"
    now = time.time()
    window_start = now - window_seconds

    # Lua script for atomicity
    lua = """
        redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
        local count = redis.call('ZCARD', KEYS[1])
        if count < tonumber(ARGV[2]) then
            redis.call('ZADD', KEYS[1], ARGV[3], ARGV[3])
            redis.call('EXPIRE', KEYS[1], ARGV[4])
            return 0
        end
        return 1
    """
    result = await r.eval(lua, 1, key, window_start, max_events, now, window_seconds)
    return bool(result)

Follow-up

How would you handle a Kafka consumer lag spike — 5 million unprocessed events — without losing data or causing rate limit bypass?