How to Fix Nginx 'worker_connections are not enough: accept() failed (24: Too many open files)'
Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 10 mins
TL;DR
- What broke: Nginx hit the OS file descriptor ceiling (
ulimit -n). Every new TCP connection requires an fd. When the pool is exhausted,accept()hard-fails and Nginx drops all incoming requests. - How to fix it: Raise
worker_rlimit_nofileinnginx.conf, sync it with the systemdLimitNOFILEoverride, and recalculateworker_connectionsto match. - Use our Client-Side Sandbox below to paste your
nginx.confand auto-refactor the worker and fd directives without sending your config off-box.
The Incident (What Does the Error Mean?)
Raw log output from /var/log/nginx/error.log:
2024/07/15 03:42:17 [alert] 1234#1234: worker_connections are not enough: accept() failed (24: Too many open files)
2024/07/15 03:42:17 [alert] 1234#1234: *8821903 open() "/var/cache/nginx/..." failed (24: Too many open files)
Errno 24 is EMFILE — the process-level open file descriptor table is full. Nginx cannot open a new socket for the incoming connection. The request is silently dropped. No 502, no 503 — the TCP handshake completes at the kernel level but Nginx immediately closes it. Clients see a connection reset. Monitoring that only checks HTTP status codes will miss this entirely.
The Attack Vector / Blast Radius
This is a cascading capacity failure, not a single-point event:
- Each Nginx connection consumes at minimum 1 fd (client socket). Proxying to an upstream doubles it. Serving a file adds another. A single HTTP/1.1 keepalive session with assets can hold 3–5 fds simultaneously.
- Default
ulimit -non most Linux distros is 1024. A single Nginx worker withworker_connections 1024will saturate this under moderate load — the math doesn't work unless you explicitly raise the OS limit. - Systemd overrides the shell ulimit. Even if you set
ulimit -n 65535in/etc/security/limits.conf, a systemd-managed Nginx process ignores it unlessLimitNOFILEis set in the service unit. This is the #1 reason the fix appears to work in testing and fails in production. - Blast radius: 100% of new inbound connections fail. Active keepalive connections already established continue until they close. Traffic spikes (deploys, marketing events, bot crawls) trigger the threshold unpredictably. The worker process does not crash — it keeps running and logging alerts at high frequency, which can itself cause disk I/O pressure.
How to Fix It (The Solution)
Step 1 — Diagnose current limits
# Check the running Nginx worker's actual fd limit
NGINX_PID=$(pgrep -o nginx)
cat /proc/$NGINX_PID/limits | grep 'open files'
# Check current system-wide fd usage
cat /proc/sys/fs/file-nr
# output: <allocated> <freed> <max>
Step 2 — Basic Fix: nginx.conf
# /etc/nginx/nginx.conf
- worker_processes 1;
+ worker_processes auto; # matches CPU core count
+ worker_rlimit_nofile 65535; # MUST be set; raises per-worker fd limit
events {
- worker_connections 1024;
+ worker_connections 16384;
+ use epoll; # Linux: explicit epoll for performance
+ multi_accept on; # accept all pending connections per epoll event
}
http {
keepalive_timeout 65;
+ keepalive_requests 1000;
}
Rule of thumb: worker_connections × worker_processes × 2 (upstream fd) must be less than worker_rlimit_nofile × worker_processes.
Step 3 — Enterprise Best Practice: Systemd Unit Override
Without this, Step 2 alone will not hold under systemd.
systemctl edit nginx
# /etc/systemd/system/nginx.service.d/override.conf
+ [Service]
+ LimitNOFILE=65535
+ LimitNOFILESoft=65535
systemctl daemon-reload
systemctl restart nginx
# Verify — must show 65535
cat /proc/$(pgrep -o nginx)/limits | grep 'open files'
Step 4 — OS-Level Kernel Tuning (for >10k concurrent connections)
# /etc/sysctl.conf
+ fs.file-max = 2097152
+ net.core.somaxconn = 65535
+ net.ipv4.tcp_max_syn_backlog = 65535
sysctl -p
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Nginx config linting in your pipeline:
# In your CI step — catches syntax errors before deploy
nginx -t -c /path/to/nginx.conf
2. Checkov policy for Nginx fd limits (custom check):
# checkov custom check: ensure worker_rlimit_nofile >= 65535
from checkov.common.models.enums import CheckResult
def check_nginx_rlimit(config_text):
import re
match = re.search(r'worker_rlimit_nofile\s+(\d+)', config_text)
if not match or int(match.group(1)) < 65535:
return CheckResult.FAILED
return CheckResult.PASSED
3. Ansible enforcement — ensure the systemd override is always present:
- name: Set Nginx LimitNOFILE via systemd override
ansible.builtin.copy:
dest: /etc/systemd/system/nginx.service.d/override.conf
content: |
[Service]
LimitNOFILE=65535
notify:
- daemon-reload
- restart nginx
4. Alerting — add a Prometheus alert on fd exhaustion before it hits 100%:
# prometheus/alerts.yml
- alert: NginxFdExhaustionImminent
expr: nginx_connections_active / nginx_worker_connections_limit > 0.80
for: 2m
labels:
severity: warning
annotations:
summary: "Nginx fd pool >80% utilized — preemptive action required"