Design a distributed rate limiter for an API gateway handling 100,000 requests per second
cs-sys-001
Your answer
Answer as you would in a real interview — explain your thinking, not just the conclusion.
Model answer
I would use a sliding window algorithm backed by Redis with an atomic Lua script that increments a sorted-set entry for the current millisecond, trims entries outside the window, and returns the current count in one round trip. All gateway nodes share the same Redis counter, so limits are enforced globally regardless of which node handles the request. For sub-millisecond latency at 100k RPS, each node maintains a local token bucket that refreshes from Redis every 100ms — this cuts Redis traffic by ~95% while staying within a small margin of the true limit. If Redis is unavailable, the gateway fails open (allows traffic) rather than blocking everything — the risk of a brief burst is lower than a full outage.
Code example
-- Atomic Redis Lua: sliding window counter
local key = KEYS[1]
local now = tonumber(ARGV[1]) -- epoch ms
local window = tonumber(ARGV[2]) -- ms
local limit = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count < limit then
redis.call('ZADD', key, now, now .. '-' .. redis.call('INCR', key .. ':seq'))
redis.call('PEXPIRE', key, window)
return 1 -- allowed
end
return 0 -- rate limited
Follow-up
How does your design handle multi-region deployments where Redis replication lag could allow limit violations? What is an acceptable trade-off?