Initializing Enclave...

How to Fix Docker 'Too Many Open Files' ulimit nofile Error in Production

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 5 mins

TL;DR

  • What broke: Docker's default nofile ulimit (1024/1024) is exhausted — your container cannot open new file descriptors, sockets, or pipes, causing hard crashes.
  • How to fix it: Override ulimit nofile to 65536:65536 at the daemon, compose, or docker run level.
  • Action: Use our Client-Side Sandbox above to auto-refactor your docker-compose.yml or daemon.json and get the corrected ulimit block instantly.

The Incident (What Does the Error Mean?)

Raw error output from container logs or host dmesg:

Error response from daemon: OCI runtime create failed:
container_linux.go:380: starting container process caused:
process_linux.go:545: container init caused:
rootfs_linux.go:76: mounting "/dev/null" caused:
open /dev/null: too many open files

or in application logs:

accept tcp 0.0.0.0:8080: accept: too many open files
ERROR: EMFILE: too many open files, open '/app/config.json'

Immediate consequence: The process cannot open any new file descriptors — no new TCP connections, no log file writes, no IPC pipes. The container is effectively dead to new traffic while existing goroutines/threads spin or panic.


The Attack Vector / Blast Radius

This is a resource exhaustion failure, not a code bug. The Linux kernel enforces a per-process file descriptor limit via ulimit -n. Docker inherits the host daemon's default (nofile: 1024) and passes it into every container unless explicitly overridden.

Cascading failure chain:

  1. High-concurrency service (Node.js, Nginx, Java NIO, Go net/http) opens one FD per connection + one per log file + one per DB pool connection.
  2. FD count hits 1024. All accept() syscalls return EMFILE.
  3. Load balancer health checks fail. Container is marked unhealthy.
  4. Orchestrator (ECS, Kubernetes, Swarm) restarts the container.
  5. On restart, in-flight requests are dropped. If the underlying cause (traffic spike, FD leak) isn't fixed, the new container hits the same ceiling within seconds — restart loop.
  6. In a microservices mesh, the failing container backs up upstream callers, triggering timeout storms across dependent services.

FD leak multiplier: If your app has a file descriptor leak (unclosed sockets, missing defer f.Close()), a low nofile limit accelerates the failure timeline from hours to minutes under load.


How to Fix It

Basic Fix — docker run

- docker run -d myapp:latest
+ docker run -d --ulimit nofile=65536:65536 myapp:latest

Basic Fix — docker-compose.yml

  services:
    app:
      image: myapp:latest
+     ulimits:
+       nofile:
+         soft: 65536
+         hard: 65536

Enterprise Best Practice — Docker Daemon Global Default (/etc/docker/daemon.json)

Setting this at the daemon level ensures every container on the host inherits the correct limit without per-service configuration drift:

  {
-   "log-driver": "json-file"
+   "log-driver": "json-file",
+   "default-ulimits": {
+     "nofile": {
+       "Name": "nofile",
+       "Hard": 65536,
+       "Soft": 65536
+     }
+   }
  }

After editing daemon.json:

sudo systemctl reload docker
# Verify:
docker run --rm busybox sh -c 'ulimit -n'
# Expected output: 65536

Kubernetes (if you migrated from Docker Compose)

Kubernetes does not honor Docker ulimits keys. Use an init container or a securityContext + kernel parameter approach, or the LimitRange admission controller:

  apiVersion: v1
  kind: Pod
  spec:
    containers:
    - name: app
      image: myapp:latest
+     securityContext:
+       sysctls: []
+   initContainers:
+   - name: set-ulimits
+     image: busybox
+     command: ['sh', '-c', 'ulimit -n 65536']
+     securityContext:
+       privileged: true

Or enforce via LimitRange at namespace level (preferred for multi-tenant clusters).

Soft vs Hard limit: Soft limit is the current enforced value; hard limit is the ceiling a process can raise itself to without root. Set both equal in production to eliminate ambiguity.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Checkov Policy (IaC Scanning)

Add to your pipeline to catch missing ulimits in Compose files before merge:

# .checkov.yml
checks:
  - CKV_DOCKER_ULIMIT_NOFILE  # custom check or use checkov --check CKV2_DOCKER_*

Write a custom Checkov check:

# checkov/custom/check_ulimit_nofile.py
from checkov.common.models.enums import CheckResult
from checkov.dockerfile.checks.base_dockerfile_check import BaseDockerfileCheck

class UlimitNofileCheck(BaseDockerfileCheck):
    def __init__(self):
        super().__init__(
            name="Ensure ulimit nofile >= 65536",
            id="CKV_CUSTOM_DOCKER_1",
            supported_instructions=["RUN"]
        )

2. OPA/Conftest Policy for docker-compose.yml

# policy/ulimit.rego
package compose

deny[msg] {
  service := input.services[name]
  not service.ulimits.nofile.hard
  msg := sprintf("Service '%v' missing ulimits.nofile.hard", [name])
}

deny[msg] {
  service := input.services[name]
  service.ulimits.nofile.hard < 65536
  msg := sprintf("Service '%v' ulimits.nofile.hard must be >= 65536", [name])
}

Run in CI:

conftest test docker-compose.yml --policy policy/

3. Host-Level Baseline (Ansible/Terraform)

Enforce /etc/docker/daemon.json via configuration management so no host is provisioned without the correct default:

# terraform/modules/ec2_docker_host/userdata.sh
cat > /etc/docker/daemon.json <<EOF
{
  "default-ulimits": {
    "nofile": { "Name": "nofile", "Hard": 65536, "Soft": 65536 }
  }
}
EOF
systemctl reload docker

4. Monitor FD Usage Before You Hit the Wall

# Per-container FD count (run on host)
for pid in $(docker inspect --format '{{.State.Pid}}' $(docker ps -q)); do
  echo "PID $pid: $(ls /proc/$pid/fd 2>/dev/null | wc -l) FDs"
done

Alert when any container exceeds 80% of its nofile hard limit — not after it crashes.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →