Initializing Enclave...

How to Fix Kubernetes DiskPressure Pod Evictions Caused by Container Log Accumulation

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

  • What broke: Kubernetes worker nodes crossed the imagefs.available or nodefs.available eviction threshold because container log files under /var/log/pods/ or /var/lib/docker/containers/ were never rotated, consuming 80%+ of node disk.
  • How to fix it: Enforce containerLogMaxSize and containerLogMaxFiles in the kubelet config, and immediately purge orphaned log files with find + truncate.
  • Fast path: Use our Client-Side Sandbox below to auto-refactor your kubelet ConfigMap or container spec — paste your config and get corrected YAML instantly.

The Incident (What Does the Error Mean?)

Raw kubelet event from kubectl describe node <node-name>:

Conditions:
  Type              Status
  ----              ------
  DiskPressure      True

Events:
  Warning  Evicted   pod/api-server-7d9f4b8c6-xk2p9   
  The node had condition: DiskPressure

Raw eviction event from kubectl get events --field-selector reason=Evicted -A:

WARNING  Evicting pod api-server-7d9f4b8c6-xk2p9 
because it exceeded its local ephemeral storage limit 
or the node is under DiskPressure.

Immediate consequence: The kubelet's eviction manager fires. Pods are terminated without a graceful drain. Deployments restart pods onto other nodes — which are likely suffering the same log bloat — and the eviction cascades cluster-wide. Stateful workloads without PVCs lose ephemeral data permanently.


The Attack Vector / Blast Radius

This is a cascading resource exhaustion failure, not a single-node problem.

Why it escalates fast:

  1. No log rotation by default in many kubelet configs. If containerLogMaxSize is unset, a single verbose container (e.g., Istio sidecar, a Java app dumping stack traces) can write gigabytes in hours.
  2. /var/log/pods/ symlinks to /var/lib/docker/containers/ or containerd's content store. Deleting the pod does NOT immediately reclaim disk — the log file inode stays alive until the runtime GCs it.
  3. Evicted pods leave orphaned log directories. /var/log/pods/<namespace>_<pod>_<uid>/ persists after eviction. On a busy node cycling through evictions, these accumulate rapidly.
  4. Eviction threshold breach triggers a feedback loop: Eviction → new pod scheduled → new logs → disk fills faster → more evictions. Nodes can become NotReady within minutes.
  5. DaemonSets are not evicted by default, meaning your log shippers (Fluentd, Filebeat) stay running and continue writing their own buffers to the same full disk.

Blast radius: Full node NotReady, cluster autoscaler spinning up replacement nodes that inherit the same misconfiguration, SLA breach, potential data loss for non-PVC workloads.


How to Fix It

Step 0: Immediate Triage (Run This First)

# Identify top disk consumers on the affected node (run via node shell or SSH)
du -sh /var/log/pods/* | sort -rh | head -20
du -sh /var/lib/containerd/* | sort -rh | head -10

# Truncate (do NOT delete — deletion of open files won't free inodes) active log files over 500MB
find /var/log/pods/ -name '*.log' -size +500M -exec truncate -s 0 {} \;

# Force kubelet to GC dead containers
kubectl delete pod --field-selector=status.phase=Failed -A

Basic Fix: Kubelet Log Rotation Config

Edit the kubelet config on each affected node. Location is typically /etc/kubernetes/kubelet-config.yaml or managed via a ConfigMap in kubeadm clusters.

 apiVersion: kubelet.config.k8s.io/v1beta1
 kind: KubeletConfiguration
-# containerLogMaxSize and containerLogMaxFiles not set (defaults: 10Mi / 5)
+containerLogMaxSize: "50Mi"
+containerLogMaxFiles: 3
 evictionHard:
-  nodefs.available: "5%"
+  nodefs.available: "10%"
+  nodefs.inodesFree: "5%"
+imageGCHighThresholdPercent: 75
+imageGCLowThresholdPercent: 70

After editing, restart kubelet:

systemctl daemon-reload && systemctl restart kubelet

Enterprise Best Practice: Enforce via DaemonSet + Logrotate + Resource Quotas

1. Container-level log limits in the Pod spec (defense-in-depth):

 containers:
 - name: api-server
   image: myapp:v2.1.0
+  resources:
+    limits:
+      ephemeral-storage: "2Gi"
+    requests:
+      ephemeral-storage: "512Mi"

2. Cluster-level enforcement via LimitRange:

+apiVersion: v1
+kind: LimitRange
+metadata:
+  name: ephemeral-storage-limits
+  namespace: production
+spec:
+  limits:
+  - type: Container
+    max:
+      ephemeral-storage: "4Gi"
+    default:
+      ephemeral-storage: "1Gi"
+    defaultRequest:
+      ephemeral-storage: "256Mi"

3. Node-level logrotate DaemonSet for legacy containerd/docker log paths:

+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: log-purger
+  namespace: kube-system
+spec:
+  selector:
+    matchLabels:
+      app: log-purger
+  template:
+    spec:
+      tolerations:
+      - operator: Exists  # Run even on tainted/pressured nodes
+      hostPID: true
+      containers:
+      - name: log-purger
+        image: alpine:3.19
+        command:
+        - /bin/sh
+        - -c
+        - |
+          while true; do
+            find /host/var/log/pods -name '*.log' -size +200M \
+              -exec truncate -s 0 {} \;
+            find /host/var/log/pods -maxdepth 3 -type d -empty -delete
+            sleep 300
+          done
+        volumeMounts:
+        - name: varlog
+          mountPath: /host/var/log
+      volumes:
+      - name: varlog
+        hostPath:
+          path: /var/log

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. OPA/Gatekeeper policy — block pods with no ephemeral-storage limit:

package kubernetes.admission

deny[msg] {
  input.request.kind.kind == "Pod"
  container := input.request.object.spec.containers[_]
  not container.resources.limits["ephemeral-storage"]
  msg := sprintf("Container '%v' must define ephemeral-storage limit", [container.name])
}

2. Checkov scan in your pipeline:

# Fails pipeline if ephemeral-storage limits are missing
checkov -d ./k8s-manifests \
  --check CKV_K8S_20 \
  --check CKV_K8S_11

3. Prometheus alerting — fire BEFORE eviction threshold is hit:

- alert: NodeDiskPressureImminent
  expr: |
    (node_filesystem_avail_bytes{mountpoint="/"} /
     node_filesystem_size_bytes{mountpoint="/"}) < 0.15
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Node {{ $labels.instance }} disk below 15% — eviction imminent"

4. Kubelet config drift detection with kube-bench:

kube-bench node --check 4.2.10,4.2.11
# Validates containerLogMaxSize and containerLogMaxFiles are set

Set containerLogMaxSize: 50Mi and containerLogMaxFiles: 3 as your org-wide baseline. Enforce it via your node bootstrap scripts (Terraform user_data, Ansible, or EKS managed node group launch templates). Any node that drifts from this config should be flagged by your configuration management pipeline before it joins the cluster.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →