How to Fix Kubelet Garbage Collection Failures Caused by Device or Resource Busy Locks
Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins
TL;DR
- What broke: Kubelet's garbage collection loop cannot remove stopped/dead containers because the overlay or devicemapper mount is still held open by a leaked process, a stale bind mount, or a zombie containerd/docker shim.
- How to fix it: Identify the blocking PID with
lsoforfuser, kill or unmount it, then force-trigger GC or restart the container runtime. - Shortcut: Use our Client-Side Sandbox below to auto-refactor your kubelet config and generate the exact
crictl/nsenterremediation commands for your specific runtime.
The Incident (What Does the Error Mean?)
Raw kubelet log output:
E0612 03:14:22.871042 1423 image_gc_manager.go:305] Failed to garbage collect required amount of images. Wanted to free 5368709120 bytes, but only found 0 bytes eligible to free.
E0612 03:14:22.871198 1423 container_gc.go:85] Failed to garbage collect containers: device or resource busy
E0612 03:14:22.871301 1423 kubelet.go:1302] Image garbage collection failed multiple times in a row: failed to garbage collect required amount of images
Immediate consequence: Kubelet's GC loop is stuck. Dead containers accumulate. Overlay mounts pile up under /var/lib/containerd or /var/lib/docker/overlay2. The node's disk fills, inode table exhausts, and new pod scheduling fails with FailedMount or Evicted events. In severe cases the node goes NotReady.
The Attack Vector / Blast Radius
This is a cascading disk exhaustion failure, not a one-pod problem.
Overlay FS leak: Each stopped container leaves an overlay mount. If a process (log forwarder, sidecar shim,
nsentersession left open by an operator) holds a file descriptor into that overlay directory, the kernel returnsEBUSYonumount2(). Kubelet cannot proceed.Blast radius escalates fast:
- Disk fills → new image pulls fail →
ImagePullBackOffcluster-wide on that node. - Inode exhaustion hits before raw disk in environments with many small files (log-heavy workloads).
df -iwill show 100% inode usage whiledf -hshows free space — a notoriously confusing failure mode. - Kubelet's eviction manager cannot evict pods fast enough if GC is blocked, triggering a hard node pressure condition.
- If this node is a control plane node, etcd or API server pods may fail to restart after eviction.
- Disk fills → new image pulls fail →
Common culprits:
- Fluent Bit / Fluentd tailing logs inside a dead container's overlay directory.
- A
kubectl execorkubectl debugsession left open by an operator. - Stale
containerd-shimordocker-containerd-shimprocesses not reaped after container exit. auditdor a security agent (Falco, Sysdig) holding an inotify watch on the overlay mount.- NFS or CSI volume not unmounted cleanly before container termination.
How to Fix It
Step 1: Identify the blocking mount and PID
# Find all overlay mounts for dead containers
grep overlay /proc/mounts | awk '{print $2}'
# Find what is holding each busy mount
# Replace <mount_path> with the path from above
sudo lsof +D <mount_path> 2>/dev/null
sudo fuser -vm <mount_path> 2>/dev/null
# For containerd: list dead containers
sudo crictl ps -a --state Exited
sudo crictl ps -a --state Created
# For docker runtime:
sudo docker ps -a --filter status=exited --filter status=dead
Step 2: Kill the blocking process or force-unmount
# Kill the offending PID (replace <PID>)
sudo kill -9 <PID>
# If the process is gone but mount is still busy (kernel ref count):
sudo umount -l <mount_path> # lazy unmount — detaches from namespace immediately
# Force remove dead containers via crictl
sudo crictl rm $(sudo crictl ps -a --state Exited -q)
Basic Fix — Restart the container runtime to reap all stale shims
# containerd
sudo systemctl restart containerd
sudo systemctl restart kubelet
# docker (if using dockershim — EOL, migrate off this)
sudo systemctl restart docker
sudo systemctl restart kubelet
⚠️ Restarting containerd briefly interrupts running pods on the node. Cordon first in production:
kubectl cordon <node>
Enterprise Best Practice — Tune kubelet GC thresholds and add eviction safeguards
The default kubelet GC config is too permissive for high-churn workloads. Patch your kubelet config:
# /var/lib/kubelet/config.yaml
imageGCHighThresholdPercent: 85
- imageGCLowThresholdPercent: 80
+ imageGCLowThresholdPercent: 70
- containerLogMaxSize: "10Mi"
+ containerLogMaxSize: "5Mi"
- containerLogMaxFiles: 5
+ containerLogMaxFiles: 3
+ evictionHard:
+ nodefs.available: "10%"
+ nodefs.inodesFree: "5%"
+ imagefs.available: "15%"
For containerd, ensure shim reaping is not disabled:
# /etc/containerd/config.toml
[plugins."io.containerd.runtime.v1.linux"]
- shim_debug = true
+ shim_debug = false
+ no_shim = false
For Fluent Bit (common culprit) — ensure it does not tail into overlay paths directly:
[INPUT]
Name tail
- Path /var/lib/containerd/io.containerd.runtime.v2.task/*/log
+ Path /var/log/containers/*.log
+ DB /var/log/flb_kube.db
+ Rotate_Wait 5
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Node-level monitoring — alert before disk fills
# Prometheus alerting rule
- alert: NodeDiskPressureImminent
expr: |
(node_filesystem_avail_bytes{mountpoint="/var/lib/containerd"} /
node_filesystem_size_bytes{mountpoint="/var/lib/containerd"}) < 0.15
for: 5m
labels:
severity: warning
annotations:
summary: "Node {{ $labels.instance }} containerd disk < 15%"
- alert: NodeInodeExhaustion
expr: |
node_filesystem_files_free{mountpoint="/"} /
node_filesystem_files{mountpoint="/"} < 0.05
for: 2m
labels:
severity: critical
2. OPA/Gatekeeper — enforce log size limits on all pods
package k8srequiredloglimits
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.resources.limits["ephemeral-storage"]
msg := sprintf("Container '%v' must set ephemeral-storage limit", [container.name])
}
3. Checkov / kube-linter in CI pipeline
# Add to your GitHub Actions / GitLab CI
kube-linter lint ./k8s-manifests/ \
--checks container-resources,no-read-only-root-fs
checkov -d ./k8s-manifests \
--check CKV_K8S_11,CKV_K8S_13 # CPU/memory limits enforced
4. Periodic node hygiene CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: containerd-gc-assist
namespace: kube-system
spec:
schedule: "0 */4 * * *"
jobTemplate:
spec:
template:
spec:
hostPID: true
tolerations:
- operator: Exists
containers:
- name: gc
image: alpine:3.19
securityContext:
privileged: true
command:
- /bin/sh
- -c
- |
crictl rmp --force $(crictl pods --state NotReady -q) 2>/dev/null || true
crictl rm $(crictl ps -a --state Exited -q) 2>/dev/null || true
restartPolicy: OnFailure
nodeSelector:
kubernetes.io/os: linux
Long-term: If this node runs high-churn batch workloads, dedicate a separate
imagefsdisk mounted at/var/lib/containerdto isolate container storage from root filesystem pressure entirely.