How to Fix Docker cgroups Memory Limit OOM Kills: Diagnosing and Resolving Kernel Container Terminations
Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 15 mins
TL;DR
- What broke: The Linux kernel's cgroup v1/v2 memory subsystem enforced a hard
memory.limit_in_byteson your container. The process exceeded it. The kernel OOM killer sentSIGKILL. Container is dead. - How to fix it: Either raise
--memoryto match actual workload RSS, or tune the application's internal heap/memory settings to stay under the existing limit. Add a soft limit (--memory-reservation) to get early warnings. - Shortcut: Use our Client-Side Sandbox below to auto-refactor your Docker Compose or
docker runflags against this exact OOM scenario.
The Incident (What does the error mean?)
Your kernel logs and Docker daemon surface this as:
KERNEL: Memory cgroup out of memory: Kill process 14821 (java) score 1823 or sacrifice child
Killed process 14821 (java) total-vm:4194304kB, anon-rss:2097152kB, file-rss:32768kB
docker: container killed by kernel OOM killer
exited with code 137
Exit code 137 = 128 + 9 (SIGKILL). This is not a graceful shutdown. There is no cleanup, no connection draining, no checkpoint. The process is instantly terminated by the kernel at the moment cgroup memory.usage_in_bytes attempts to exceed memory.limit_in_bytes. Any in-flight transactions, open file handles, or write buffers are gone.
Verify the active cgroup limit on a running container:
# Get the cgroup path
docker inspect <container_id> --format '{{.HostConfig.Memory}}'
# Read directly from cgroup v1
cat /sys/fs/cgroup/memory/docker/<container_id>/memory.limit_in_bytes
# cgroup v2
cat /sys/fs/cgroup/system.slice/docker-<container_id>.scope/memory.max
# Watch OOM events in real time
dmesg -w | grep -i 'oom\|killed'
The Attack Vector / Blast Radius
This is a cascading failure vector, not an isolated container death.
- Kubernetes restart loops: If this is a K8s pod,
OOMKilledstatus triggersCrashLoopBackOff. The pod restarts with exponential backoff. Under sustained load, it never recovers. YourDeploymentis effectively down. - Data corruption: Any container writing to a database, message queue, or file without
fsyncguarantees loses buffered writes. PostgreSQL connections are severed mid-transaction. Kafka producers lose unacknowledged messages. - Noisy neighbor amplification: On a shared node, if
--memory-swapis not explicitly set, Docker defaults to 2x the memory limit as swap. The container starts thrashing swap before the OOM kill, degrading every other container on the host for minutes before the kill fires. - Silent recurrence: Without
--restart=on-failureor proper liveness probes, the container stays dead. Monitoring that only checks HTTP endpoints misses this entirely if the port never re-binds. - JVM/Node.js specific trap: The JVM does not respect cgroup limits by default below JDK 8u191. It reads
/proc/meminfofor total host RAM and sets heap accordingly. A container with--memory=512mon a 64GB host will have the JVM attempt a multi-GB heap. It will be killed every time.
How to Fix It
Basic Fix — Raise the Limit to Match Actual RSS
First, profile actual usage before changing anything:
# Peak RSS of the container over 60 seconds
docker stats <container_id> --no-stream --format "table {{.MemUsage}}\t{{.MemPerc}}"
# docker run flags
- docker run --memory=256m --memory-swap=256m myapp:latest
+ docker run --memory=768m --memory-swap=1g --memory-reservation=512m myapp:latest
# docker-compose.yml
services:
app:
image: myapp:latest
deploy:
resources:
limits:
- memory: 256M
+ memory: 768M
reservations:
- memory: 128M
+ memory: 512M
--memory-reservation (soft limit): This does not kill the container. It signals the kernel to reclaim memory from this container first under host memory pressure. Use it as your early-warning threshold — set it to ~70% of the hard limit.
Enterprise Best Practice — Fix the Application, Not Just the Limit
Raising limits without fixing the root cause is kicking the can. The correct fix is constraining the application's internal allocator to stay under the cgroup ceiling.
JVM (Java/Scala/Kotlin) — the most common offender:
# Dockerfile or entrypoint
- CMD ["java", "-jar", "app.jar"]
+ CMD ["java", "-XX:+UseContainerSupport", "-XX:MaxRAMPercentage=75.0", "-XX:InitialRAMPercentage=50.0", "-XX:+ExitOnOutOfMemoryError", "-jar", "app.jar"]
-XX:+UseContainerSupport (JDK 8u191+, JDK 10+) makes the JVM read cgroup limits instead of host /proc/meminfo. -XX:MaxRAMPercentage=75.0 caps heap at 75% of the cgroup limit, leaving headroom for metaspace, native threads, and off-heap buffers. -XX:+ExitOnOutOfMemoryError gives you a clean exit code instead of a hung, zombie JVM.
Node.js:
- CMD ["node", "server.js"]
+ CMD ["node", "--max-old-space-size=400", "server.js"]
For a 512MB container, --max-old-space-size=400 (in MB) caps V8's old generation heap, leaving ~100MB for OS, native modules, and Buffer allocations.
Kubernetes — set both requests and limits, never only limits:
resources:
- limits:
- memory: "512Mi"
+ requests:
+ memory: "384Mi"
+ limits:
+ memory: "512Mi"
Without requests, the scheduler places the pod on any node regardless of available memory. The pod gets scheduled onto a saturated node and OOMs immediately on startup.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
The goal is catching under-provisioned memory limits before they hit production.
1. Checkov — static analysis on Compose and K8s manifests:
pip install checkov
checkov -f docker-compose.yml --check CKV_DOCKER_7,CKV_DOCKER_14
checkov -d ./k8s-manifests/ --check CKV_K8S_11,CKV_K8S_13
CKV_K8S_11 flags missing memory limits. CKV_K8S_13 flags missing memory requests.
2. OPA/Gatekeeper — enforce memory limits at admission:
# Deny pods with no memory limit set
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.resources.limits.memory
msg := sprintf("Container '%v' has no memory limit. OOM kills will be uncontrolled.", [container.name])
}
3. Load test with memory profiling in staging — not optional:
# Run your load test, then capture peak memory
docker stats --format "{{.MemUsage}}" <container_id> > mem_profile.log &
k6 run --vus 50 --duration 5m load_test.js
Set your --memory limit to peak observed RSS + 25% headroom. Never guess.
4. Alerting — alert on container_memory_working_set_bytes before the kill:
# Prometheus alert rule
- alert: ContainerMemoryNearLimit
expr: |
(container_memory_working_set_bytes / container_spec_memory_limit_bytes) > 0.85
for: 2m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.container }} at {{ $value | humanizePercentage }} of memory limit"
Firing at 85% gives you time to act before the kernel fires at 100%.