Rate limit nên đặt ở layer nào?

Càng gần edge càng tốt: CDN (Cloudflare WAF), reverse proxy (nginx, HAProxy), API gateway, application layer. Mỗi layer chặn được loại traffic khác nhau. CDN chặn DDoS volumetric, gateway chặn API abuse, app chặn business logic abuse.

Rate limit theo IP đủ không?

Không. Mobile dùng cùng IP carrier, office NAT, proxy đều share IP. Kết hợp: IP + user_id + API key + endpoint. Anonymous: rate limit IP nhẹ; authenticated: rate limit per user_id.

Token bucket vs sliding window khác nhau gì?

Token bucket: cho phép burst (refill liên tục), nhẹ tài nguyên. Sliding window: chính xác hơn, không cho burst spike, nhưng tốn memory hơn. Public API thường dùng token bucket (UX tốt), security/billing dùng sliding window (chính xác).

Khi user bị rate limit, server trả gì?

429 Too Many Requests + header Retry-After (giây hoặc HTTP date) + X-RateLimit-Limit/Remaining/Reset. Body JSON có error code rõ. Đừng trả 200 + body lỗi — phá monitoring.

Rate limit distributed có khó không?

Có. Multi-instance app cần shared state (Redis). Lua script đảm bảo atomic. Hoặc dùng managed service: Cloudflare Rate Limiting, AWS WAF, API Gateway. Tự build với Redis Lua là cân bằng tốt cho team trung bình.

Rate Limiting — Token Bucket Sliding Window So Sánh

Rate limiting là tuyến phòng thủ đầu tiên của mọi API production. Bài này so sánh 4 thuật toán phổ biến (fixed window, sliding window, token bucket, leaky bucket), kèm code Redis Lua chạy distributed được.

Vì sao cần rate limit?

Không có rate limit, một bot script có thể:

Brute-force password — 1000 request/giây vào /login
Scrape data — đọc hết 1M user trong 1 giờ
Cost attack — gọi endpoint /generate-pdf để đốt CPU/memory
Abuse free tier — tạo 1000 account thử nghiệm

Rate limit không chống được attacker có nguồn lực vô hạn (DDoS distributed) — đó là việc của Cloudflare/CDN. Nhưng nó chặn 99% abuse từ single source.

1. Fixed Window — đơn giản nhưng có flaw

Counter reset mỗi N giây. Đơn giản nhất:

// 100 request / 60 giây
async function fixedWindow(redis, key) {
  const window = Math.floor(Date.now() / 60000)
  const k = `rl:${key}:${window}`
  const count = await redis.incr(k)
  if (count === 1) await redis.expire(k, 60)
  return count <= 100
}

Flaw: cho phép burst gấp đôi ở rìa window. User có thể gửi 100 request lúc 10:59:59 và 100 request lúc 11:00:00 — tổng 200 request trong 1 giây.

2. Sliding Window — chính xác hơn

Đếm request trong khoảng thời gian "lùi N giây từ bây giờ" thay vì window cố định. Implement bằng sorted set Redis:

-- Redis Lua: atomic check + add
-- KEYS[1] = bucket key, ARGV: now (ms), window (ms), limit, request_id
local now    = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit  = tonumber(ARGV[3])
local id     = ARGV[4]

-- Xoá entries cũ
redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, now - window)

local count = redis.call('ZCARD', KEYS[1])
if count >= limit then
  return 0   -- rejected
end

redis.call('ZADD', KEYS[1], now, id)
redis.call('PEXPIRE', KEYS[1], window)
return 1     -- allowed

import { createClient } from 'redis'
const redis = createClient(); await redis.connect()

const SLIDING_WINDOW_LUA = `...` // lua trên

async function slidingWindow(key, limit, windowMs) {
  const now = Date.now()
  const id = `${now}-${Math.random()}`
  const ok = await redis.eval(SLIDING_WINDOW_LUA, {
    keys: [`rl:${key}`],
    arguments: [now.toString(), windowMs.toString(), limit.toString(), id],
  })
  return ok === 1
}

Chính xác nhưng tốn memory (lưu mỗi request). Phù hợp khi limit nhỏ (≤ 100/min).

3. Token Bucket — cho phép burst hợp lý

Bucket có capacity N, refill R token/giây. Mỗi request tốn 1 token. Hết token → reject.

-- KEYS[1] = bucket key
-- ARGV: capacity, refill_per_sec, now (sec), cost
local capacity = tonumber(ARGV[1])
local refill   = tonumber(ARGV[2])
local now      = tonumber(ARGV[3])
local cost     = tonumber(ARGV[4])

local data = redis.call('HMGET', KEYS[1], 'tokens', 'last')
local tokens = tonumber(data[1]) or capacity
local last   = tonumber(data[2]) or now

-- Refill dựa trên thời gian trôi qua
local elapsed = math.max(0, now - last)
tokens = math.min(capacity, tokens + elapsed * refill)

local allowed = 0
if tokens >= cost then
  tokens = tokens - cost
  allowed = 1
end

redis.call('HMSET', KEYS[1], 'tokens', tokens, 'last', now)
redis.call('EXPIRE', KEYS[1], math.ceil(capacity / refill) * 2)

return { allowed, tokens }

Token bucket cho phép user "tiết kiệm" và burst sau — UX tốt hơn cho API public. Cloudflare, AWS, Stripe đều dùng pattern này.

4. Leaky Bucket — smoothing traffic

Tương tự token bucket nhưng request vào queue, drain ra với rate cố định. Phù hợp khi cần smoothing (e.g. SMS gateway giới hạn 10 msg/sec gửi về provider).

Response chuẩn khi bị rate limit

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1735000800

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Quá giới hạn request. Thử lại sau 30 giây.",
    "retry_after": 30
  }
}

Header Retry-After là chuẩn HTTP — client library tự retry. X-RateLimit-* không chuẩn nhưng phổ biến (GitHub, Stripe, Twitter API).

Middleware Express dùng được ngay

function rateLimit({ windowMs, max, keyFn }) {
  return async (req, res, next) => {
    const key = keyFn(req)
    const ok = await slidingWindow(key, max, windowMs)
    res.setHeader('X-RateLimit-Limit', max)
    if (!ok) {
      res.setHeader('Retry-After', Math.ceil(windowMs / 1000))
      return res.status(429).json({
        error: { code: 'RATE_LIMITED', message: 'Quá giới hạn request' }
      })
    }
    next()
  }
}

// Áp dụng tier khác nhau theo endpoint
app.use('/auth/login', rateLimit({
  windowMs: 60_000, max: 5,
  keyFn: (req) => `login:${req.ip}`
}))

app.use('/api/', rateLimit({
  windowMs: 60_000, max: 100,
  keyFn: (req) => req.user ? `user:${req.user.id}` : `ip:${req.ip}`
}))

Layer nhiều rate limit

Một limit không đủ. Production nên có:

Layer	Giới hạn	Mục đích
CDN (Cloudflare)	10k req/min/IP	Chặn DDoS volumetric
API Gateway	1k req/min/API key	Chống abuse partner
Application — chung	100 req/min/user	Fair usage
Application — sensitive	5 req/min/IP cho /login	Brute-force
Application — expensive	10 req/hour/user cho /generate-report	Cost protection

Distributed rate limit

Multi-instance Node.js: in-memory counter không chia sẻ. Bắt buộc dùng Redis. Lua script đảm bảo atomic — không bao giờ có race condition cho phép vượt limit.

Với load cực cao (>100k QPS), Redis có thể là bottleneck. Giải: shard theo key, hoặc dùng approximate counter (count-min sketch) — tối ưu memory, sai số chấp nhận được. Tham khảo Backend caching Redis để hiểu sâu pattern Redis production.

Kết luận

Rate limiting đúng chuẩn không phải optional — nó là baseline security. Token bucket + Redis Lua + layered defense là setup chuẩn cho 95% API production. Bắt đầu bằng limit lỏng, monitor 1-2 tuần, siết dần dựa trên dữ liệu thực — đừng tunable từ ngày đầu.

Rate Limiting: Token Bucket vs Sliding Window — Code Mẫu