C# / .NET System Design SystemDesign Frequent system design rate limiting Redis distributed scalability

Design a distributed rate limiter for an API gateway handling 100,000 requests per second

cs-sys-001

Your answer

Answer as you would in a real interview — explain your thinking, not just the conclusion.

Model answer

I would use a sliding window algorithm backed by Redis with an atomic Lua script that increments a sorted-set entry for the current millisecond, trims entries outside the window, and returns the current count in one round trip. All gateway nodes share the same Redis counter, so limits are enforced globally regardless of which node handles the request. For sub-millisecond latency at 100k RPS, each node maintains a local token bucket that refreshes from Redis every 100ms — this cuts Redis traffic by ~95% while staying within a small margin of the true limit. If Redis is unavailable, the gateway fails open (allows traffic) rather than blocking everything — the risk of a brief burst is lower than a full outage.

Code example

-- Atomic Redis Lua: sliding window counter
local key = KEYS[1]
local now = tonumber(ARGV[1])   -- epoch ms
local window = tonumber(ARGV[2]) -- ms
local limit = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count < limit then
  redis.call('ZADD', key, now, now .. '-' .. redis.call('INCR', key .. ':seq'))
  redis.call('PEXPIRE', key, window)
  return 1  -- allowed
end
return 0  -- rate limited

Follow-up

How does your design handle multi-region deployments where Redis replication lag could allow limit violations? What is an acceptable trade-off?