How to Fix Docker 'Too Many Open Files' ulimit nofile Error in Production
Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 5 mins
TL;DR
- What broke: Docker's default
nofileulimit (1024/1024) is exhausted — your container cannot open new file descriptors, sockets, or pipes, causing hard crashes. - How to fix it: Override
ulimit nofileto65536:65536at the daemon, compose, ordocker runlevel. - Action: Use our Client-Side Sandbox above to auto-refactor your
docker-compose.ymlordaemon.jsonand get the corrected ulimit block instantly.
The Incident (What Does the Error Mean?)
Raw error output from container logs or host dmesg:
Error response from daemon: OCI runtime create failed:
container_linux.go:380: starting container process caused:
process_linux.go:545: container init caused:
rootfs_linux.go:76: mounting "/dev/null" caused:
open /dev/null: too many open files
or in application logs:
accept tcp 0.0.0.0:8080: accept: too many open files
ERROR: EMFILE: too many open files, open '/app/config.json'
Immediate consequence: The process cannot open any new file descriptors — no new TCP connections, no log file writes, no IPC pipes. The container is effectively dead to new traffic while existing goroutines/threads spin or panic.
The Attack Vector / Blast Radius
This is a resource exhaustion failure, not a code bug. The Linux kernel enforces a per-process file descriptor limit via ulimit -n. Docker inherits the host daemon's default (nofile: 1024) and passes it into every container unless explicitly overridden.
Cascading failure chain:
- High-concurrency service (Node.js, Nginx, Java NIO, Go net/http) opens one FD per connection + one per log file + one per DB pool connection.
- FD count hits 1024. All
accept()syscalls returnEMFILE. - Load balancer health checks fail. Container is marked unhealthy.
- Orchestrator (ECS, Kubernetes, Swarm) restarts the container.
- On restart, in-flight requests are dropped. If the underlying cause (traffic spike, FD leak) isn't fixed, the new container hits the same ceiling within seconds — restart loop.
- In a microservices mesh, the failing container backs up upstream callers, triggering timeout storms across dependent services.
FD leak multiplier: If your app has a file descriptor leak (unclosed sockets, missing defer f.Close()), a low nofile limit accelerates the failure timeline from hours to minutes under load.
How to Fix It
Basic Fix — docker run
- docker run -d myapp:latest
+ docker run -d --ulimit nofile=65536:65536 myapp:latest
Basic Fix — docker-compose.yml
services:
app:
image: myapp:latest
+ ulimits:
+ nofile:
+ soft: 65536
+ hard: 65536
Enterprise Best Practice — Docker Daemon Global Default (/etc/docker/daemon.json)
Setting this at the daemon level ensures every container on the host inherits the correct limit without per-service configuration drift:
{
- "log-driver": "json-file"
+ "log-driver": "json-file",
+ "default-ulimits": {
+ "nofile": {
+ "Name": "nofile",
+ "Hard": 65536,
+ "Soft": 65536
+ }
+ }
}
After editing daemon.json:
sudo systemctl reload docker
# Verify:
docker run --rm busybox sh -c 'ulimit -n'
# Expected output: 65536
Kubernetes (if you migrated from Docker Compose)
Kubernetes does not honor Docker ulimits keys. Use an init container or a securityContext + kernel parameter approach, or the LimitRange admission controller:
apiVersion: v1
kind: Pod
spec:
containers:
- name: app
image: myapp:latest
+ securityContext:
+ sysctls: []
+ initContainers:
+ - name: set-ulimits
+ image: busybox
+ command: ['sh', '-c', 'ulimit -n 65536']
+ securityContext:
+ privileged: true
Or enforce via LimitRange at namespace level (preferred for multi-tenant clusters).
Soft vs Hard limit: Soft limit is the current enforced value; hard limit is the ceiling a process can raise itself to without root. Set both equal in production to eliminate ambiguity.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Checkov Policy (IaC Scanning)
Add to your pipeline to catch missing ulimits in Compose files before merge:
# .checkov.yml
checks:
- CKV_DOCKER_ULIMIT_NOFILE # custom check or use checkov --check CKV2_DOCKER_*
Write a custom Checkov check:
# checkov/custom/check_ulimit_nofile.py
from checkov.common.models.enums import CheckResult
from checkov.dockerfile.checks.base_dockerfile_check import BaseDockerfileCheck
class UlimitNofileCheck(BaseDockerfileCheck):
def __init__(self):
super().__init__(
name="Ensure ulimit nofile >= 65536",
id="CKV_CUSTOM_DOCKER_1",
supported_instructions=["RUN"]
)
2. OPA/Conftest Policy for docker-compose.yml
# policy/ulimit.rego
package compose
deny[msg] {
service := input.services[name]
not service.ulimits.nofile.hard
msg := sprintf("Service '%v' missing ulimits.nofile.hard", [name])
}
deny[msg] {
service := input.services[name]
service.ulimits.nofile.hard < 65536
msg := sprintf("Service '%v' ulimits.nofile.hard must be >= 65536", [name])
}
Run in CI:
conftest test docker-compose.yml --policy policy/
3. Host-Level Baseline (Ansible/Terraform)
Enforce /etc/docker/daemon.json via configuration management so no host is provisioned without the correct default:
# terraform/modules/ec2_docker_host/userdata.sh
cat > /etc/docker/daemon.json <<EOF
{
"default-ulimits": {
"nofile": { "Name": "nofile", "Hard": 65536, "Soft": 65536 }
}
}
EOF
systemctl reload docker
4. Monitor FD Usage Before You Hit the Wall
# Per-container FD count (run on host)
for pid in $(docker inspect --format '{{.State.Pid}}' $(docker ps -q)); do
echo "PID $pid: $(ls /proc/$pid/fd 2>/dev/null | wc -l) FDs"
done
Alert when any container exceeds 80% of its nofile hard limit — not after it crashes.