How to Fix Nginx 504 'Upstream Timed Out' When proxy_read_timeout Is Set to 0
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5 mins
TL;DR
- What broke:
proxy_read_timeout 0is not a valid "infinite timeout" directive in Nginx — it resolves to an immediate or undefined timeout, triggering504 upstream timed outunder any non-trivial upstream latency. - How to fix it: Replace
proxy_read_timeout 0with an explicit, tuned value in seconds (e.g.,proxy_read_timeout 120s) matched to your upstream's worst-case response time. - Fast path: Use our Client-Side Sandbox below to auto-refactor this — paste your
nginx.conforlocationblock and get corrected timeout directives instantly.
The Incident (What Does the Error Mean?)
Raw error from /var/log/nginx/error.log:
2024/01/15 03:42:17 [error] 19#19: *1043 upstream timed out (110: Connection timed out)
while reading response header from upstream, client: 10.0.1.45,
server: api.internal, request: "POST /api/process HTTP/1.1",
upstream: "http://127.0.0.1:8080/api/process",
host: "api.internal"
Nginx is terminating the upstream read before the backend finishes responding. With proxy_read_timeout 0, Nginx does not wait indefinitely — the value 0 is treated as an invalid or zero-second timeout depending on the build and OS socket implementation. The result is that any upstream taking more than a few milliseconds to return a response header gets killed with a 504.
Immediate consequence: All requests hitting that location block that touch a backend with any processing latency — DB queries, LLM calls, file I/O, batch jobs — are hard-failing at the proxy layer before the backend even flushes its response.
The Attack Vector / Blast Radius
This is a self-inflicted availability failure, not a security exploit, but the blast radius is severe:
Total endpoint blackout for long-running requests. Any API endpoint doing meaningful work (>~1s) is unreachable. This includes webhooks, payment processing, async job triggers — anything that isn't a cache hit.
Retry storms. Clients receiving 504s will retry. If your upstream is already under load, retries amplify the queue depth. Combined with a zero-timeout that kills connections instantly, you get a feedback loop: Nginx kills connections → clients retry → upstream queue grows → more timeouts.
Misleading monitoring. Because the error appears as a timeout, engineers often chase upstream performance (scaling pods, tuning DB) when the actual fault is a single misconfigured directive in Nginx. This wastes hours during an outage.
Kubernetes/ECS health check failures. If your load balancer health checks route through this Nginx instance, cascading 504s can mark healthy backend pods as unhealthy, triggering unnecessary pod restarts or task replacements.
How to Fix It (The Solution)
Basic Fix
Replace the zero value with an explicit timeout in seconds. The proxy_read_timeout directive controls how long Nginx waits between two successive read operations from the upstream, not the total transfer time.
location /api/ {
proxy_pass http://backend_upstream;
- proxy_read_timeout 0;
+ proxy_read_timeout 120s;
+ proxy_connect_timeout 10s;
+ proxy_send_timeout 30s;
}
Rule of thumb: Set proxy_read_timeout to your upstream's P99 response time plus a 20% buffer. For LLM inference or heavy batch endpoints, 300s–600s is not unreasonable.
Enterprise Best Practice
Do not use a single global timeout for all location blocks. Segment by endpoint class and apply timeouts at the location level. Use Nginx map blocks to drive timeout values from upstream group names, and enforce limits via a dedicated upstream block with keepalive.
http {
+ map $upstream_group $dynamic_read_timeout {
+ default 60s;
+ ~*batch 300s;
+ ~*llm 600s;
+ ~*health 5s;
+ }
upstream backend_upstream {
server 127.0.0.1:8080;
+ keepalive 32;
}
server {
listen 443 ssl;
location /api/health {
proxy_pass http://backend_upstream;
- proxy_read_timeout 0;
+ proxy_read_timeout 5s;
+ proxy_connect_timeout 3s;
}
location /api/batch {
proxy_pass http://backend_upstream;
- proxy_read_timeout 0;
+ proxy_read_timeout 300s;
+ proxy_connect_timeout 10s;
+ proxy_send_timeout 60s;
}
location /api/ {
proxy_pass http://backend_upstream;
- proxy_read_timeout 0;
+ proxy_read_timeout 60s;
+ proxy_connect_timeout 10s;
+ proxy_send_timeout 30s;
+ proxy_http_version 1.1;
+ proxy_set_header Connection "";
}
}
}
Why proxy_http_version 1.1 + Connection ""? Required for upstream keepalive to function. Without it, Nginx defaults to HTTP/1.0 and closes the connection after every request, negating the keepalive pool entirely.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
This class of misconfiguration is trivially detectable pre-merge. Add the following gates:
1. nginx -t in your pipeline (minimum viable check)
# .github/workflows/nginx-lint.yml
- name: Validate Nginx Config
run: |
docker run --rm -v $(pwd)/nginx:/etc/nginx:ro nginx:alpine nginx -t
This catches syntax errors but will not catch proxy_read_timeout 0 since it is syntactically valid.
2. gixy — Nginx static security/config analyzer
pip install gixy
gixy /etc/nginx/nginx.conf
Add a custom rule or use --disable flags to enforce minimum timeout floors.
3. OPA/Conftest policy for Nginx configs
# policy/nginx_timeouts.rego
package nginx
deny[msg] {
input.proxy_read_timeout == 0
msg := "proxy_read_timeout must not be 0. Use an explicit value >= 30s."
}
deny[msg] {
not input.proxy_read_timeout
msg := "proxy_read_timeout must be explicitly set in all location blocks."
}
conftest test nginx.conf --policy policy/
4. Checkov for Nginx-as-code (Terraform/Helm-managed Nginx)
If your Nginx config is rendered via Helm values.yaml or a Terraform templatefile(), add a Checkov custom check:
# .checkov/nginx_timeout_check.yaml
metadata:
name: NginxProxyReadTimeoutNotZero
id: CKV_CUSTOM_NGINX_001
criteria:
- key: proxy_read_timeout
operator: not_equals
value: "0"
Block merges on failure. This misconfiguration has no business reaching production.