Initializing Enclave...

How to Fix Docker cgroups Memory Limit OOM Kills: Diagnosing and Resolving Kernel Container Terminations

Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 15 mins

TL;DR

  • What broke: The Linux kernel's cgroup v1/v2 memory subsystem enforced a hard memory.limit_in_bytes on your container. The process exceeded it. The kernel OOM killer sent SIGKILL. Container is dead.
  • How to fix it: Either raise --memory to match actual workload RSS, or tune the application's internal heap/memory settings to stay under the existing limit. Add a soft limit (--memory-reservation) to get early warnings.
  • Shortcut: Use our Client-Side Sandbox below to auto-refactor your Docker Compose or docker run flags against this exact OOM scenario.

The Incident (What does the error mean?)

Your kernel logs and Docker daemon surface this as:

KERNEL: Memory cgroup out of memory: Kill process 14821 (java) score 1823 or sacrifice child
Killed process 14821 (java) total-vm:4194304kB, anon-rss:2097152kB, file-rss:32768kB
docker: container killed by kernel OOM killer
exited with code 137

Exit code 137 = 128 + 9 (SIGKILL). This is not a graceful shutdown. There is no cleanup, no connection draining, no checkpoint. The process is instantly terminated by the kernel at the moment cgroup memory.usage_in_bytes attempts to exceed memory.limit_in_bytes. Any in-flight transactions, open file handles, or write buffers are gone.

Verify the active cgroup limit on a running container:

# Get the cgroup path
docker inspect <container_id> --format '{{.HostConfig.Memory}}'

# Read directly from cgroup v1
cat /sys/fs/cgroup/memory/docker/<container_id>/memory.limit_in_bytes

# cgroup v2
cat /sys/fs/cgroup/system.slice/docker-<container_id>.scope/memory.max

# Watch OOM events in real time
dmesg -w | grep -i 'oom\|killed'

The Attack Vector / Blast Radius

This is a cascading failure vector, not an isolated container death.

  1. Kubernetes restart loops: If this is a K8s pod, OOMKilled status triggers CrashLoopBackOff. The pod restarts with exponential backoff. Under sustained load, it never recovers. Your Deployment is effectively down.
  2. Data corruption: Any container writing to a database, message queue, or file without fsync guarantees loses buffered writes. PostgreSQL connections are severed mid-transaction. Kafka producers lose unacknowledged messages.
  3. Noisy neighbor amplification: On a shared node, if --memory-swap is not explicitly set, Docker defaults to 2x the memory limit as swap. The container starts thrashing swap before the OOM kill, degrading every other container on the host for minutes before the kill fires.
  4. Silent recurrence: Without --restart=on-failure or proper liveness probes, the container stays dead. Monitoring that only checks HTTP endpoints misses this entirely if the port never re-binds.
  5. JVM/Node.js specific trap: The JVM does not respect cgroup limits by default below JDK 8u191. It reads /proc/meminfo for total host RAM and sets heap accordingly. A container with --memory=512m on a 64GB host will have the JVM attempt a multi-GB heap. It will be killed every time.

How to Fix It

Basic Fix — Raise the Limit to Match Actual RSS

First, profile actual usage before changing anything:

# Peak RSS of the container over 60 seconds
docker stats <container_id> --no-stream --format "table {{.MemUsage}}\t{{.MemPerc}}"
# docker run flags
- docker run --memory=256m --memory-swap=256m myapp:latest
+ docker run --memory=768m --memory-swap=1g --memory-reservation=512m myapp:latest
# docker-compose.yml
services:
  app:
    image: myapp:latest
    deploy:
      resources:
        limits:
-         memory: 256M
+         memory: 768M
        reservations:
-         memory: 128M
+         memory: 512M

--memory-reservation (soft limit): This does not kill the container. It signals the kernel to reclaim memory from this container first under host memory pressure. Use it as your early-warning threshold — set it to ~70% of the hard limit.


Enterprise Best Practice — Fix the Application, Not Just the Limit

Raising limits without fixing the root cause is kicking the can. The correct fix is constraining the application's internal allocator to stay under the cgroup ceiling.

JVM (Java/Scala/Kotlin) — the most common offender:

# Dockerfile or entrypoint
- CMD ["java", "-jar", "app.jar"]
+ CMD ["java", "-XX:+UseContainerSupport", "-XX:MaxRAMPercentage=75.0", "-XX:InitialRAMPercentage=50.0", "-XX:+ExitOnOutOfMemoryError", "-jar", "app.jar"]

-XX:+UseContainerSupport (JDK 8u191+, JDK 10+) makes the JVM read cgroup limits instead of host /proc/meminfo. -XX:MaxRAMPercentage=75.0 caps heap at 75% of the cgroup limit, leaving headroom for metaspace, native threads, and off-heap buffers. -XX:+ExitOnOutOfMemoryError gives you a clean exit code instead of a hung, zombie JVM.

Node.js:

- CMD ["node", "server.js"]
+ CMD ["node", "--max-old-space-size=400", "server.js"]

For a 512MB container, --max-old-space-size=400 (in MB) caps V8's old generation heap, leaving ~100MB for OS, native modules, and Buffer allocations.

Kubernetes — set both requests and limits, never only limits:

resources:
- limits:
-   memory: "512Mi"
+ requests:
+   memory: "384Mi"
+ limits:
+   memory: "512Mi"

Without requests, the scheduler places the pod on any node regardless of available memory. The pod gets scheduled onto a saturated node and OOMs immediately on startup.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

The goal is catching under-provisioned memory limits before they hit production.

1. Checkov — static analysis on Compose and K8s manifests:

pip install checkov
checkov -f docker-compose.yml --check CKV_DOCKER_7,CKV_DOCKER_14
checkov -d ./k8s-manifests/ --check CKV_K8S_11,CKV_K8S_13

CKV_K8S_11 flags missing memory limits. CKV_K8S_13 flags missing memory requests.

2. OPA/Gatekeeper — enforce memory limits at admission:

# Deny pods with no memory limit set
violation[{"msg": msg}] {
  container := input.review.object.spec.containers[_]
  not container.resources.limits.memory
  msg := sprintf("Container '%v' has no memory limit. OOM kills will be uncontrolled.", [container.name])
}

3. Load test with memory profiling in staging — not optional:

# Run your load test, then capture peak memory
docker stats --format "{{.MemUsage}}" <container_id> > mem_profile.log &
k6 run --vus 50 --duration 5m load_test.js

Set your --memory limit to peak observed RSS + 25% headroom. Never guess.

4. Alerting — alert on container_memory_working_set_bytes before the kill:

# Prometheus alert rule
- alert: ContainerMemoryNearLimit
  expr: |
    (container_memory_working_set_bytes / container_spec_memory_limit_bytes) > 0.85
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "Container {{ $labels.container }} at {{ $value | humanizePercentage }} of memory limit"

Firing at 85% gives you time to act before the kernel fires at 100%.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →