Initializing Enclave...

Fixing 'KubeletNotReady: container runtime is down' After Docker Daemon Restart on Self-Managed Kubernetes

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5–15 mins


TL;DR

  • What broke: Docker daemon restarted (upgrade, OOM kill, or manual systemctl restart docker) and kubelet lost its /var/run/dockershim.sock or unix:///var/run/docker.sock CRI connection, permanently stalling the node heartbeat.
  • How to fix it: Restart kubelet after confirming Docker is healthy; add a systemd After= and Requires= dependency so this never cascades again.
  • Fast path: Drop your kubelet.service unit or kubeadm config into the Client-Side Sandbox above to auto-generate the corrected dependency block and kubelet flags.

The Incident (What Does the Error Mean?)

Raw signal you'll see on kubectl describe node <node>:

Conditions:
  Type    Status  Reason              Message
  ----    ------  ------              -------
  Ready   False   KubeletNotReady     container runtime is down

Events:
  Warning  NodeNotReady  kubelet  Node node-01 status is now: NodeNotReady

Kubelet journal (journalctl -u kubelet -n 50 --no-pager):

E0610 03:12:44.821341    1423 kubelet.go:2419] "Error getting node" err="rpc error:
  code = Unavailable desc = connection error: desc = \"transport: Error while
  dialing dial unix /var/run/dockershim.sock: connect: no such file or directory\""

Immediate consequence: The node stops posting Ready heartbeats to the API server. After node-monitor-grace-period (default 40 s), the node controller marks it NotReady. After pod-eviction-timeout (default 5 min), it begins evicting pods. If this is a single-node control plane, kube-scheduler and kube-controller-manager may also be affected if they run as static pods on that node.


The Attack Vector / Blast Radius

This is a cascading availability failure, not a security exploit, but the blast radius is severe:

  1. Pod eviction storm: All pods on the affected node get Terminating. If PodDisruptionBudgets are misconfigured, entire deployments go dark.
  2. Stateful workload data risk: StatefulSets with ReadWriteOnce PVCs cannot reschedule until the node is fully removed from the cluster or the volume is force-detached — this can take 6+ minutes by default.
  3. Control plane self-destruction (kubeadm clusters): If etcd or kube-apiserver runs as a static pod on this node, the entire cluster API becomes unreachable. You lose kubectl access entirely.
  4. Horizontal amplification: In clusters using --container-runtime=docker (pre-1.24) with multiple nodes sharing a Docker registry mirror running on this node, a single daemon restart can trigger NotReady across N nodes simultaneously.
  5. Silent recurrence: Without a systemd dependency fix, every future docker package upgrade via apt/yum auto-restart will reproduce this outage.

How to Fix It

Step 1 — Verify Docker is actually up

systemctl is-active docker
docker info 2>&1 | head -5
ls -la /var/run/docker.sock

If docker.sock is missing, Docker is still down. Fix Docker first:

journalctl -u docker -n 100 --no-pager   # find the root cause
systemctl start docker

Step 2 — Basic Fix (immediate node recovery)

# Confirm CRI socket is responsive
crictl --runtime-endpoint unix:///var/run/dockershim.sock info

# Restart kubelet — it will re-establish the CRI connection
systemctl restart kubelet

# Watch node recover (should flip Ready within 30s)
kubectl get node <node-name> -w

Step 3 — Enterprise Best Practice (permanent fix via systemd dependency)

The root cause is that kubelet.service has no hard dependency on docker.service. When Docker restarts, kubelet is not restarted alongside it.

Create a systemd drop-in override:

mkdir -p /etc/systemd/system/kubelet.service.d/
cat > /etc/systemd/system/kubelet.service.d/10-docker-dependency.conf << 'EOF'
[Unit]
After=docker.service
Requires=docker.service

[Service]
Restart=always
RestartSec=5s
EOF

systemctl daemon-reload

Diff — kubelet unit before vs. after:

# /etc/systemd/system/kubelet.service.d/10-docker-dependency.conf

 [Unit]
 Description=kubelet: The Kubernetes Node Agent
-After=network-online.target
-Wants=network-online.target
+After=network-online.target docker.service
+Wants=network-online.target
+Requires=docker.service

 [Service]
 ExecStart=/usr/bin/kubelet
-Restart=on-failure
-RestartSec=10s
+Restart=always
+RestartSec=5s

For kubeadm clusters on K8s ≥ 1.24 (containerd CRI — dockershim removed):

# /etc/systemd/system/kubelet.service.d/10-containerd-dependency.conf

 [Unit]
-After=network-online.target
+After=network-online.target containerd.service
+Requires=containerd.service

 [Service]
-Restart=on-failure
+Restart=always
+RestartSec=5s

If you're still on dockershim in 2024: You are running an EOL CRI. Migrate to containerd or cri-o. The dockershim was removed in Kubernetes 1.24.

Verify the dependency graph is wired correctly:

systemctl show kubelet | grep -E 'After=|Requires='
systemd-analyze dot kubelet.service | dot -Tsvg > kubelet-deps.svg

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Node conformance test in your image bake pipeline

Run kubeadm upgrade node or node-problem-detector as a post-provision smoke test. Gate AMI/image promotion on a healthy Ready condition.

2. OPA/Gatekeeper policy — enforce containerd as the only permitted CRI

# ConstraintTemplate snippet
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequireContainerdRuntime
metadata:
  name: require-containerd-runtime
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Node"]
  parameters:
    requiredRuntime: "containerd"

3. Ansible/Terraform provisioner — enforce the drop-in at node bootstrap

# In your Ansible kubelet role: tasks/main.yml
+- name: Install kubelet docker dependency drop-in
+  copy:
+    dest: /etc/systemd/system/kubelet.service.d/10-runtime-dependency.conf
+    content: |
+      [Unit]
+      After=containerd.service
+      Requires=containerd.service
+      [Service]
+      Restart=always
+      RestartSec=5s
+  notify: reload systemd

4. Node Problem Detector — alert before eviction fires

Deploy node-problem-detector with a custom rule targeting container runtime is down in kubelet logs. Fire a PagerDuty alert the moment the condition appears — you have a ~40-second window before the node flips NotReady and a ~5-minute window before pod eviction begins.

# custom-plugin-monitor.json (node-problem-detector config)
{
  "rules": [
    {
      "type": "temporary",
      "reason": "ContainerRuntimeDown",
      "pattern": "container runtime is down"
    }
  ]
}

5. Checkov IaC scan — flag missing systemd hardening

If you manage node userdata via Terraform templatefile(), add a Checkov custom check asserting the kubelet drop-in is present in the rendered cloud-init script. Fail the terraform plan in CI if it's absent.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →