Initializing Enclave...

How to Fix Nginx 'worker_connections are not enough: accept() failed (24: Too many open files)'

Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 10 mins

TL;DR

  • What broke: Nginx hit the OS file descriptor ceiling (ulimit -n). Every new TCP connection requires an fd. When the pool is exhausted, accept() hard-fails and Nginx drops all incoming requests.
  • How to fix it: Raise worker_rlimit_nofile in nginx.conf, sync it with the systemd LimitNOFILE override, and recalculate worker_connections to match.
  • Use our Client-Side Sandbox below to paste your nginx.conf and auto-refactor the worker and fd directives without sending your config off-box.

The Incident (What Does the Error Mean?)

Raw log output from /var/log/nginx/error.log:

2024/07/15 03:42:17 [alert] 1234#1234: worker_connections are not enough: accept() failed (24: Too many open files)
2024/07/15 03:42:17 [alert] 1234#1234: *8821903 open() "/var/cache/nginx/..." failed (24: Too many open files)

Errno 24 is EMFILE — the process-level open file descriptor table is full. Nginx cannot open a new socket for the incoming connection. The request is silently dropped. No 502, no 503 — the TCP handshake completes at the kernel level but Nginx immediately closes it. Clients see a connection reset. Monitoring that only checks HTTP status codes will miss this entirely.


The Attack Vector / Blast Radius

This is a cascading capacity failure, not a single-point event:

  1. Each Nginx connection consumes at minimum 1 fd (client socket). Proxying to an upstream doubles it. Serving a file adds another. A single HTTP/1.1 keepalive session with assets can hold 3–5 fds simultaneously.
  2. Default ulimit -n on most Linux distros is 1024. A single Nginx worker with worker_connections 1024 will saturate this under moderate load — the math doesn't work unless you explicitly raise the OS limit.
  3. Systemd overrides the shell ulimit. Even if you set ulimit -n 65535 in /etc/security/limits.conf, a systemd-managed Nginx process ignores it unless LimitNOFILE is set in the service unit. This is the #1 reason the fix appears to work in testing and fails in production.
  4. Blast radius: 100% of new inbound connections fail. Active keepalive connections already established continue until they close. Traffic spikes (deploys, marketing events, bot crawls) trigger the threshold unpredictably. The worker process does not crash — it keeps running and logging alerts at high frequency, which can itself cause disk I/O pressure.

How to Fix It (The Solution)

Step 1 — Diagnose current limits

# Check the running Nginx worker's actual fd limit
NGINX_PID=$(pgrep -o nginx)
cat /proc/$NGINX_PID/limits | grep 'open files'

# Check current system-wide fd usage
cat /proc/sys/fs/file-nr
# output: <allocated> <freed> <max>

Step 2 — Basic Fix: nginx.conf

# /etc/nginx/nginx.conf

- worker_processes  1;
+ worker_processes  auto;  # matches CPU core count

+ worker_rlimit_nofile 65535;  # MUST be set; raises per-worker fd limit

  events {
-     worker_connections  1024;
+     worker_connections  16384;
+     use epoll;           # Linux: explicit epoll for performance
+     multi_accept on;     # accept all pending connections per epoll event
  }

  http {
      keepalive_timeout  65;
+     keepalive_requests 1000;
  }

Rule of thumb: worker_connections × worker_processes × 2 (upstream fd) must be less than worker_rlimit_nofile × worker_processes.

Step 3 — Enterprise Best Practice: Systemd Unit Override

Without this, Step 2 alone will not hold under systemd.

systemctl edit nginx
# /etc/systemd/system/nginx.service.d/override.conf

+ [Service]
+ LimitNOFILE=65535
+ LimitNOFILESoft=65535
systemctl daemon-reload
systemctl restart nginx

# Verify — must show 65535
cat /proc/$(pgrep -o nginx)/limits | grep 'open files'

Step 4 — OS-Level Kernel Tuning (for >10k concurrent connections)

# /etc/sysctl.conf

+ fs.file-max = 2097152
+ net.core.somaxconn = 65535
+ net.ipv4.tcp_max_syn_backlog = 65535
sysctl -p

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Nginx config linting in your pipeline:

# In your CI step — catches syntax errors before deploy
nginx -t -c /path/to/nginx.conf

2. Checkov policy for Nginx fd limits (custom check):

# checkov custom check: ensure worker_rlimit_nofile >= 65535
from checkov.common.models.enums import CheckResult

def check_nginx_rlimit(config_text):
    import re
    match = re.search(r'worker_rlimit_nofile\s+(\d+)', config_text)
    if not match or int(match.group(1)) < 65535:
        return CheckResult.FAILED
    return CheckResult.PASSED

3. Ansible enforcement — ensure the systemd override is always present:

- name: Set Nginx LimitNOFILE via systemd override
  ansible.builtin.copy:
    dest: /etc/systemd/system/nginx.service.d/override.conf
    content: |
      [Service]
      LimitNOFILE=65535
  notify:
    - daemon-reload
    - restart nginx

4. Alerting — add a Prometheus alert on fd exhaustion before it hits 100%:

# prometheus/alerts.yml
- alert: NginxFdExhaustionImminent
  expr: nginx_connections_active / nginx_worker_connections_limit > 0.80
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "Nginx fd pool >80% utilized — preemptive action required"

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →