How to Fix 'Container Is Not Running' Error After Unexpected Docker Stop
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5–15 mins
TL;DR
- What broke: The container process exited (crash, OOM kill, or bad entrypoint) — Docker preserves the dead container in
Exitedstate, makingdocker execimpossible. - How to fix it: Identify the exit code via
docker inspect, fix the root cause, then restart withdocker start <container>or rebuild with a hardened entrypoint. - Fast path: Use our Client-Side Sandbox below to auto-refactor your
docker-compose.ymlorDockerfile— paste your config and get a patched version with restart policies and health checks generated instantly.
The Incident (What Does the Error Mean?)
Raw error:
$ docker exec -it my_app sh
Error response from daemon: Container 3f2a1b9c0d4e is not running
docker exec requires a running PID 1 inside the container. When the container's main process exits for any reason, Docker transitions the container to Exited state. The container filesystem is preserved but no process is alive to attach to. docker exec is dead in the water until PID 1 is alive again.
Confirm the state immediately:
docker ps -a --filter "name=my_app" --format "table {{.Names}}\t{{.Status}}\t{{.ExitCode}}\t{{.ID}}"
You'll see Exited (1) or Exited (137) — that exit code is your primary diagnostic signal.
Exit code cheat sheet:
| Exit Code | Meaning |
|---|---|
0 |
Clean exit — your CMD finished normally |
1 |
Application error / unhandled exception |
137 |
SIGKILL — OOM killer or docker kill |
139 |
Segfault |
143 |
SIGTERM — graceful stop requested |
The Attack Vector / Blast Radius
This is not just an inconvenience. A container in Exited state in production means:
- No exec access — you cannot shell in to gather diagnostics while the incident is live. If you didn't configure logging drivers or a sidecar, you are blind.
- Cascading dependency failures — any service depending on this container (via Docker network DNS) starts throwing connection refused. In a compose stack, this fans out fast.
- OOMKill loop risk — if
restart: alwaysis set without fixing the memory leak, Docker respawns the container into a crash loop, consuming host CPU and masking the real problem in a wall of restart logs. - Data corruption risk — if the container was mid-write to a bind-mounted volume when it was killed (exit 137), you may have partially written files. Check your application's write-ahead log or journal.
- Silent failure in orchestrators — in Swarm or raw Docker on a VM, without an external health monitor, this container sits dead and no alert fires unless you have explicit health checks wired to your alerting stack.
Forensics first — always run this before restarting:
# Get the full state, OOMKilled flag, and exit code
docker inspect my_app | jq '.[0].State'
# Get the last 200 lines of logs from the dead container
docker logs --tail 200 my_app
# Check host-level OOM events
dmesg | grep -i 'oom\|killed process' | tail -20
How to Fix It (The Solution)
Basic Fix — Restart the Container
Once you've pulled logs and identified the exit code:
# Restart the existing stopped container
docker start my_app
# Verify it's running
docker ps --filter "name=my_app"
# Attach exec once confirmed running
docker exec -it my_app sh
⚠️ Do not skip the forensics step above. Restarting without root cause analysis puts you back in the same crash loop in minutes.
Enterprise Best Practice — Harden the Container Config
Problem: No restart policy, no health check, no entrypoint guard.
# docker-compose.yml
services:
app:
image: my_app:latest
- restart: "no"
+ restart: unless-stopped
+ mem_limit: 512m
+ memswap_limit: 512m
+ healthcheck:
+ test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
+ interval: 15s
+ timeout: 5s
+ retries: 3
+ start_period: 10s
environment:
- APP_ENV=production
Problem: Dockerfile entrypoint exits immediately (classic script-exits-zero issue).
# Dockerfile
- CMD ["./start.sh"]
+ # Ensure PID 1 is a long-running process, not a fire-and-forget script
+ ENTRYPOINT ["./docker-entrypoint.sh"]
+ CMD ["./app-server", "--port", "8080"]
# docker-entrypoint.sh
#!/bin/sh
set -e
+ # Trap SIGTERM for graceful shutdown
+ trap 'echo "SIGTERM received, shutting down"; exit 0' TERM
+
# Run migrations or init tasks
./migrate.sh
- ./app-server &
+ # exec replaces shell with app — PID 1 is now the app, not sh
+ exec "$@"
Using exec "$@" is non-negotiable. Without it, your shell script is PID 1, the app is a child process, and SIGTERM never reaches the app — Docker force-kills after the stop timeout, producing exit 137 every time.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Lint Dockerfiles in your pipeline with Hadolint:
# .github/workflows/docker-lint.yml
- name: Lint Dockerfile
uses: hadolint/[email protected]
with:
dockerfile: Dockerfile
failure-threshold: warning
Hadolint rule DL3025 flags CMD used with shell form instead of exec form. Rule DL3006 catches missing health checks.
2. Enforce health checks with OPA/Conftest:
# policy/docker_compose.rego
package docker_compose
deny[msg] {
service := input.services[name]
not service.healthcheck
msg := sprintf("Service '%v' is missing a healthcheck definition", [name])
}
deny[msg] {
service := input.services[name]
not service.restart
msg := sprintf("Service '%v' has no restart policy", [name])
}
# Run in CI before deploy
conftest test docker-compose.yml --policy policy/
3. Monitor container state with a dead-simple Prometheus alert:
# alerting rules
- alert: ContainerNotRunning
expr: time() - container_last_seen{name=~"my_app.*"} > 60
for: 1m
labels:
severity: critical
annotations:
summary: "Container {{ $labels.name }} has been down for over 60s"
4. Set explicit stop_grace_period in compose to avoid SIGKILL races:
services:
app:
stop_grace_period: 30s
This gives your app 30 seconds to handle SIGTERM cleanly before Docker escalates to SIGKILL (exit 137). Default is 10 seconds — often not enough for apps draining in-flight requests.