What does 'excess: 5.000 by zone' mean in Nginx error logs?

It means the incoming request rate exceeded the zone's configured rate by 5 requests at the moment Nginx evaluated the leaky-bucket counter. The number is a floating-point representation of the overage in the token bucket algorithm. If this value exceeds your configured `burst` parameter, the request is dropped or delayed.

Should I use `nodelay` or `delay` with my `limit_req` burst directive?

Use `nodelay` for APIs and any latency-sensitive endpoint. Without it, Nginx artificially spaces out burst requests over time — at `rate=10r/s` with `excess=5`, that injects 500ms of synthetic delay per request before eventually 503-ing anyway. `nodelay` accepts burst requests instantly up to the burst ceiling, then hard-rejects beyond that. Use `delay=N` only when you want to allow a partial burst immediately and throttle the remainder.

Why does changing `limit_req_status` from 503 to 429 matter?

HTTP 503 signals 'service unavailable' — clients and load balancers treat it as a hard failure and may remove your backend from rotation or trigger alerts. HTTP 429 'Too Many Requests' is the RFC 6585 standard for rate limiting. It tells clients to back off and retry, enables proper Retry-After header handling, and keeps your SLO dashboards clean by separating rate limit events from genuine service failures.

How to Fix Nginx 'limiting requests, excess: 5.000 by zone' Rate Limit Errors in Production

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 10 mins

TL;DR

What broke: Nginx's limit_req zone is rejecting legitimate requests because the burst queue is exhausted — excess of 5.000 means 5 requests over the defined rate arrived simultaneously with no buffer headroom.
How to fix it: Increase the burst parameter on your limit_req directive and add nodelay to prevent queuing latency from compounding the problem.
Use our Client-Side Sandbox above to paste your nginx.conf and auto-refactor the limit_req_zone and limit_req directives instantly.

The Incident (What Does the Error Mean?)

Raw log output triggering this alert:

2024/01/15 03:42:17 [error] 1234#1234: *98231 limiting requests, excess: 5.000 by zone "api_limit", client: 203.0.113.47, server: api.example.com, request: "POST /v1/checkout HTTP/1.1"

excess: 5.000 is Nginx's internal leaky-bucket counter. It means the incoming request rate exceeded the zone's defined rate by exactly 5 requests at the moment of evaluation. Nginx uses a millisecond-resolution token bucket — when excess breaches the burst ceiling, the request is either delayed (if nodelay is absent) or dropped with HTTP 503.

The zone referenced (api_limit in this example) was defined with a rate like rate=10r/s and a burst=0 or insufficient burst value, giving zero tolerance for any real-world traffic spikes.

The Attack Vector / Blast Radius

This is a self-inflicted denial of service vector. The blast radius:

Legitimate users get 503'd during any bursty-but-normal traffic pattern: mobile retries, JS parallel fetches, webhook fan-outs.
Upstream services starve. If Nginx sits in front of a Node.js/Go API, the 503s bypass your application entirely — no circuit breaker, no graceful degradation, just a dropped connection logged nowhere in your app layer.
Monitoring blindspot. Most teams alert on 5xx at the app layer. Nginx-level rate limit rejections never reach the app, so your APM (Datadog, New Relic) shows zero errors while users are screaming.
Misconfigured nodelay compounds latency. Without nodelay, bursting requests are queued and held for (excess / rate) seconds. At rate=10r/s and excess=5, that's a 500ms artificial delay injected per request before the eventual 503 — worst of both worlds.

How to Fix It (The Solution)

Basic Fix — Increase Burst Headroom

The immediate lever is burst. Set it to absorb your realistic peak concurrency above the steady-state rate.

http {
    # Zone definition — 10 MB shared memory, 10 requests/sec per IP
-   limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
+   limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;

    server {
        location /v1/ {
-           limit_req zone=api_limit;
+           limit_req zone=api_limit burst=20 nodelay;
        }
    }
}

burst=20 allows up to 20 excess requests to be accepted instantly before rejection begins. nodelay processes them immediately rather than spacing them out over 2 seconds of artificial delay.

Enterprise Best Practice — Tiered Rate Limiting with Status Code Logging

Flat per-IP rate limiting destroys legitimate power users and internal services equally. Use key-based zoning and structured logging to distinguish abuse from burst.

http {
-   limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
+   # Tier 1: Authenticated users keyed by JWT sub claim (set by auth proxy)
+   limit_req_zone $http_x_user_id zone=authed_limit:20m rate=50r/s;
+
+   # Tier 2: Unauthenticated / public endpoints, keyed by IP
+   limit_req_zone $binary_remote_addr zone=public_limit:10m rate=5r/s;
+
+   # Log rate limit rejections to a dedicated file for alerting
+   limit_req_log_level warn;
+   limit_req_status 429;

    server {
        location /v1/public/ {
-           limit_req zone=api_limit;
+           limit_req zone=public_limit burst=10 nodelay;
        }

+       location /v1/authed/ {
+           limit_req zone=authed_limit burst=100 nodelay;
+       }
    }
}

Key changes:

limit_req_status 429 — Return RFC-correct 429 Too Many Requests instead of 503. This lets clients implement exponential backoff correctly.
$http_x_user_id zone key — Rate limit by authenticated identity, not IP. Shared office NATs and CDN egress IPs will no longer cause false positives.
Separate zones per endpoint class — Public endpoints stay locked down; authenticated API consumers get headroom proportional to their tier.

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

Stop misconfigured rate limit zones from reaching production:

1. nginx -t in your deploy pipeline (non-negotiable baseline)

# GitHub Actions step
- name: Validate Nginx Config
  run: docker run --rm -v $(pwd)/nginx:/etc/nginx nginx nginx -t

2. gixy static analysis — catches logic errors nginx -t misses

pip install gixy
gixy /etc/nginx/nginx.conf
# Flags missing nodelay, dangerous zone sizes, open proxy risks

3. Checkov policy for limit_req enforcement

# checkov custom check: require burst > 0 on all limit_req directives
# Add to your .checkov.yaml policy set
check: CKV_NGINX_RATE_BURST
message: "limit_req must define burst > 0 and nodelay"

4. Load test gate in staging — Use k6 or vegeta to replay your P99 traffic spike against staging before every deploy. If Nginx starts returning 429/503 at expected concurrency, the deploy is blocked.

# vegeta: 50 req/s for 30s, assert zero 5xx
echo "POST https://staging.api.example.com/v1/checkout" | \
  vegeta attack -rate=50/s -duration=30s | \
  vegeta report --type=text

If 5xx in the report is non-zero, your burst headroom is still insufficient for real traffic.