Why does 'device or resource busy' appear even after the container has exited?

The container process is gone, but the kernel still has an active reference count on the overlay mount. This happens when a second process — a log shipper, a debug session, a security agent, or a stale containerd-shim — holds an open file descriptor into the container's overlay upper/work directory. The kernel will not allow umount() until all references are released, returning EBUSY to kubelet's GC loop.

Will restarting containerd kill my running pods?

With containerd v1.6+ and the default 'live-restore' equivalent behavior, a brief containerd restart does not kill running containers because the container processes are managed by per-container shim processes, not containerd itself. However, pod networking and exec sessions will be interrupted momentarily. Always cordon the node first in production to prevent new workloads from scheduling during the restart window.

How do I tell if inode exhaustion is the real cause rather than raw disk space?

Run 'df -i' on the affected node. If the 'IUse%' column shows 100% for the root or containerd filesystem while 'df -h' shows available space remaining, you have inode exhaustion. This is common on nodes running thousands of short-lived containers that each create many small log and metadata files. The fix is the same GC remediation, but the prevention requires setting containerLogMaxFiles and containerLogMaxSize in kubelet config, and potentially reformatting the filesystem with a higher inode ratio (mkfs.ext4 -i 4096) on a replacement volume.

How to Fix Kubelet Garbage Collection Failures Caused by Device or Resource Busy Locks

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

What broke: Kubelet's garbage collection loop cannot remove stopped/dead containers because the overlay or devicemapper mount is still held open by a leaked process, a stale bind mount, or a zombie containerd/docker shim.
How to fix it: Identify the blocking PID with lsof or fuser, kill or unmount it, then force-trigger GC or restart the container runtime.
Shortcut: Use our Client-Side Sandbox below to auto-refactor your kubelet config and generate the exact crictl/nsenter remediation commands for your specific runtime.

The Incident (What Does the Error Mean?)

Raw kubelet log output:

E0612 03:14:22.871042    1423 image_gc_manager.go:305] Failed to garbage collect required amount of images. Wanted to free 5368709120 bytes, but only found 0 bytes eligible to free.
E0612 03:14:22.871198    1423 container_gc.go:85]  Failed to garbage collect containers: device or resource busy
E0612 03:14:22.871301    1423 kubelet.go:1302] Image garbage collection failed multiple times in a row: failed to garbage collect required amount of images

Immediate consequence: Kubelet's GC loop is stuck. Dead containers accumulate. Overlay mounts pile up under /var/lib/containerd or /var/lib/docker/overlay2. The node's disk fills, inode table exhausts, and new pod scheduling fails with FailedMount or Evicted events. In severe cases the node goes NotReady.

The Attack Vector / Blast Radius

This is a cascading disk exhaustion failure, not a one-pod problem.

Overlay FS leak: Each stopped container leaves an overlay mount. If a process (log forwarder, sidecar shim, nsenter session left open by an operator) holds a file descriptor into that overlay directory, the kernel returns EBUSY on umount2(). Kubelet cannot proceed.
Blast radius escalates fast:
- Disk fills → new image pulls fail → ImagePullBackOff cluster-wide on that node.
- Inode exhaustion hits before raw disk in environments with many small files (log-heavy workloads). df -i will show 100% inode usage while df -h shows free space — a notoriously confusing failure mode.
- Kubelet's eviction manager cannot evict pods fast enough if GC is blocked, triggering a hard node pressure condition.
- If this node is a control plane node, etcd or API server pods may fail to restart after eviction.
Common culprits:
- Fluent Bit / Fluentd tailing logs inside a dead container's overlay directory.
- A kubectl exec or kubectl debug session left open by an operator.
- Stale containerd-shim or docker-containerd-shim processes not reaped after container exit.
- auditd or a security agent (Falco, Sysdig) holding an inotify watch on the overlay mount.
- NFS or CSI volume not unmounted cleanly before container termination.

How to Fix It

Step 1: Identify the blocking mount and PID

# Find all overlay mounts for dead containers
grep overlay /proc/mounts | awk '{print $2}'

# Find what is holding each busy mount
# Replace <mount_path> with the path from above
sudo lsof +D <mount_path> 2>/dev/null
sudo fuser -vm <mount_path> 2>/dev/null

# For containerd: list dead containers
sudo crictl ps -a --state Exited
sudo crictl ps -a --state Created

# For docker runtime:
sudo docker ps -a --filter status=exited --filter status=dead

Step 2: Kill the blocking process or force-unmount

# Kill the offending PID (replace <PID>)
sudo kill -9 <PID>

# If the process is gone but mount is still busy (kernel ref count):
sudo umount -l <mount_path>   # lazy unmount — detaches from namespace immediately

# Force remove dead containers via crictl
sudo crictl rm $(sudo crictl ps -a --state Exited -q)

Basic Fix — Restart the container runtime to reap all stale shims

# containerd
sudo systemctl restart containerd
sudo systemctl restart kubelet

# docker (if using dockershim — EOL, migrate off this)
sudo systemctl restart docker
sudo systemctl restart kubelet

⚠️ Restarting containerd briefly interrupts running pods on the node. Cordon first in production: kubectl cordon <node>

Enterprise Best Practice — Tune kubelet GC thresholds and add eviction safeguards

The default kubelet GC config is too permissive for high-churn workloads. Patch your kubelet config:

# /var/lib/kubelet/config.yaml
  imageGCHighThresholdPercent: 85
- imageGCLowThresholdPercent: 80
+ imageGCLowThresholdPercent: 70

- containerLogMaxSize: "10Mi"
+ containerLogMaxSize: "5Mi"
- containerLogMaxFiles: 5
+ containerLogMaxFiles: 3

+ evictionHard:
+   nodefs.available: "10%"
+   nodefs.inodesFree: "5%"
+   imagefs.available: "15%"

For containerd, ensure shim reaping is not disabled:

# /etc/containerd/config.toml
  [plugins."io.containerd.runtime.v1.linux"]
-   shim_debug = true
+   shim_debug = false
+   no_shim = false

For Fluent Bit (common culprit) — ensure it does not tail into overlay paths directly:

  [INPUT]
      Name              tail
-     Path              /var/lib/containerd/io.containerd.runtime.v2.task/*/log
+     Path              /var/log/containers/*.log
+     DB                /var/log/flb_kube.db
+     Rotate_Wait       5

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Node-level monitoring — alert before disk fills

# Prometheus alerting rule
- alert: NodeDiskPressureImminent
  expr: |
    (node_filesystem_avail_bytes{mountpoint="/var/lib/containerd"} /
     node_filesystem_size_bytes{mountpoint="/var/lib/containerd"}) < 0.15
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Node {{ $labels.instance }} containerd disk < 15%"

- alert: NodeInodeExhaustion
  expr: |
    node_filesystem_files_free{mountpoint="/"} /
    node_filesystem_files{mountpoint="/"} < 0.05
  for: 2m
  labels:
    severity: critical

2. OPA/Gatekeeper — enforce log size limits on all pods

package k8srequiredloglimits

violation[{"msg": msg}] {
  container := input.review.object.spec.containers[_]
  not container.resources.limits["ephemeral-storage"]
  msg := sprintf("Container '%v' must set ephemeral-storage limit", [container.name])
}

3. Checkov / kube-linter in CI pipeline

# Add to your GitHub Actions / GitLab CI
kube-linter lint ./k8s-manifests/ \
  --checks container-resources,no-read-only-root-fs

checkov -d ./k8s-manifests \
  --check CKV_K8S_11,CKV_K8S_13  # CPU/memory limits enforced

4. Periodic node hygiene CronJob

apiVersion: batch/v1
kind: CronJob
metadata:
  name: containerd-gc-assist
  namespace: kube-system
spec:
  schedule: "0 */4 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          hostPID: true
          tolerations:
          - operator: Exists
          containers:
          - name: gc
            image: alpine:3.19
            securityContext:
              privileged: true
            command:
            - /bin/sh
            - -c
            - |
              crictl rmp --force $(crictl pods --state NotReady -q) 2>/dev/null || true
              crictl rm $(crictl ps -a --state Exited -q) 2>/dev/null || true
          restartPolicy: OnFailure
          nodeSelector:
            kubernetes.io/os: linux

Long-term: If this node runs high-churn batch workloads, dedicate a separate imagefs disk mounted at /var/lib/containerd to isolate container storage from root filesystem pressure entirely.