Fixing 'KubeletNotReady: container runtime is down' After Docker Daemon Restart on Self-Managed Kubernetes
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5–15 mins
TL;DR
- What broke: Docker daemon restarted (upgrade, OOM kill, or manual
systemctl restart docker) and kubelet lost its/var/run/dockershim.sockorunix:///var/run/docker.sockCRI connection, permanently stalling the node heartbeat. - How to fix it: Restart kubelet after confirming Docker is healthy; add a systemd
After=andRequires=dependency so this never cascades again. - Fast path: Drop your
kubelet.serviceunit orkubeadmconfig into the Client-Side Sandbox above to auto-generate the corrected dependency block and kubelet flags.
The Incident (What Does the Error Mean?)
Raw signal you'll see on kubectl describe node <node>:
Conditions:
Type Status Reason Message
---- ------ ------ -------
Ready False KubeletNotReady container runtime is down
Events:
Warning NodeNotReady kubelet Node node-01 status is now: NodeNotReady
Kubelet journal (journalctl -u kubelet -n 50 --no-pager):
E0610 03:12:44.821341 1423 kubelet.go:2419] "Error getting node" err="rpc error:
code = Unavailable desc = connection error: desc = \"transport: Error while
dialing dial unix /var/run/dockershim.sock: connect: no such file or directory\""
Immediate consequence: The node stops posting Ready heartbeats to the API server. After node-monitor-grace-period (default 40 s), the node controller marks it NotReady. After pod-eviction-timeout (default 5 min), it begins evicting pods. If this is a single-node control plane, kube-scheduler and kube-controller-manager may also be affected if they run as static pods on that node.
The Attack Vector / Blast Radius
This is a cascading availability failure, not a security exploit, but the blast radius is severe:
- Pod eviction storm: All pods on the affected node get
Terminating. If PodDisruptionBudgets are misconfigured, entire deployments go dark. - Stateful workload data risk: StatefulSets with
ReadWriteOncePVCs cannot reschedule until the node is fully removed from the cluster or the volume is force-detached — this can take 6+ minutes by default. - Control plane self-destruction (kubeadm clusters): If
etcdorkube-apiserverruns as a static pod on this node, the entire cluster API becomes unreachable. You losekubectlaccess entirely. - Horizontal amplification: In clusters using
--container-runtime=docker(pre-1.24) with multiple nodes sharing a Docker registry mirror running on this node, a single daemon restart can trigger NotReady across N nodes simultaneously. - Silent recurrence: Without a systemd dependency fix, every future
dockerpackage upgrade viaapt/yumauto-restart will reproduce this outage.
How to Fix It
Step 1 — Verify Docker is actually up
systemctl is-active docker
docker info 2>&1 | head -5
ls -la /var/run/docker.sock
If docker.sock is missing, Docker is still down. Fix Docker first:
journalctl -u docker -n 100 --no-pager # find the root cause
systemctl start docker
Step 2 — Basic Fix (immediate node recovery)
# Confirm CRI socket is responsive
crictl --runtime-endpoint unix:///var/run/dockershim.sock info
# Restart kubelet — it will re-establish the CRI connection
systemctl restart kubelet
# Watch node recover (should flip Ready within 30s)
kubectl get node <node-name> -w
Step 3 — Enterprise Best Practice (permanent fix via systemd dependency)
The root cause is that kubelet.service has no hard dependency on docker.service. When Docker restarts, kubelet is not restarted alongside it.
Create a systemd drop-in override:
mkdir -p /etc/systemd/system/kubelet.service.d/
cat > /etc/systemd/system/kubelet.service.d/10-docker-dependency.conf << 'EOF'
[Unit]
After=docker.service
Requires=docker.service
[Service]
Restart=always
RestartSec=5s
EOF
systemctl daemon-reload
Diff — kubelet unit before vs. after:
# /etc/systemd/system/kubelet.service.d/10-docker-dependency.conf
[Unit]
Description=kubelet: The Kubernetes Node Agent
-After=network-online.target
-Wants=network-online.target
+After=network-online.target docker.service
+Wants=network-online.target
+Requires=docker.service
[Service]
ExecStart=/usr/bin/kubelet
-Restart=on-failure
-RestartSec=10s
+Restart=always
+RestartSec=5s
For kubeadm clusters on K8s ≥ 1.24 (containerd CRI — dockershim removed):
# /etc/systemd/system/kubelet.service.d/10-containerd-dependency.conf
[Unit]
-After=network-online.target
+After=network-online.target containerd.service
+Requires=containerd.service
[Service]
-Restart=on-failure
+Restart=always
+RestartSec=5s
If you're still on dockershim in 2024: You are running an EOL CRI. Migrate to
containerdorcri-o. Thedockershimwas removed in Kubernetes 1.24.
Verify the dependency graph is wired correctly:
systemctl show kubelet | grep -E 'After=|Requires='
systemd-analyze dot kubelet.service | dot -Tsvg > kubelet-deps.svg
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Node conformance test in your image bake pipeline
Run kubeadm upgrade node or node-problem-detector as a post-provision smoke test. Gate AMI/image promotion on a healthy Ready condition.
2. OPA/Gatekeeper policy — enforce containerd as the only permitted CRI
# ConstraintTemplate snippet
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequireContainerdRuntime
metadata:
name: require-containerd-runtime
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Node"]
parameters:
requiredRuntime: "containerd"
3. Ansible/Terraform provisioner — enforce the drop-in at node bootstrap
# In your Ansible kubelet role: tasks/main.yml
+- name: Install kubelet docker dependency drop-in
+ copy:
+ dest: /etc/systemd/system/kubelet.service.d/10-runtime-dependency.conf
+ content: |
+ [Unit]
+ After=containerd.service
+ Requires=containerd.service
+ [Service]
+ Restart=always
+ RestartSec=5s
+ notify: reload systemd
4. Node Problem Detector — alert before eviction fires
Deploy node-problem-detector with a custom rule targeting container runtime is down in kubelet logs. Fire a PagerDuty alert the moment the condition appears — you have a ~40-second window before the node flips NotReady and a ~5-minute window before pod eviction begins.
# custom-plugin-monitor.json (node-problem-detector config)
{
"rules": [
{
"type": "temporary",
"reason": "ContainerRuntimeDown",
"pattern": "container runtime is down"
}
]
}
5. Checkov IaC scan — flag missing systemd hardening
If you manage node userdata via Terraform templatefile(), add a Checkov custom check asserting the kubelet drop-in is present in the rendered cloud-init script. Fail the terraform plan in CI if it's absent.