How to Fix Kubernetes Pod Eviction Due to Low Ephemeral Storage (With Limits & LimitRange)
Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins
TL;DR
- What broke: Kubernetes evicted your pod because the node's ephemeral storage (local disk used for container logs, writable layers, and
emptyDirvolumes) hit the eviction threshold — your containers had noephemeral-storagelimits set, so the kubelet had no guardrails and killed the pod to protect the node. - How to fix it: Set
resources.requests.ephemeral-storageandresources.limits.ephemeral-storageon every container in the pod spec, cap anyemptyDirvolumes withsizeLimit, and tune kubelet eviction thresholds at the node level. - Use our Client-Side Sandbox above to paste your failing pod/deployment YAML and auto-generate the refactored spec with correct ephemeral storage constraints applied.
The Incident (What Does the Error Mean?)
The raw event from kubectl describe pod <pod-name>:
Status: Failed
Reason: Evicted
Message: The node was low on resource: ephemeral-storage.
Threshold quantity: 10%, available: 8%.
Container <container-name> was using 4Gi, request is 0.
And from kubectl get events --field-selector reason=Evicted:
WARNING Evicted pod/my-app-7d9f8b-xk2p1
The node was low on resource: ephemeral-storage.
Threshold quantity: 10%, available: 8%.
Immediate consequence: The pod is terminated with reason: Evicted and will not be rescheduled onto the same node until the storage pressure clears. If your PodDisruptionBudget is misconfigured or your deployment has replicas: 1, this is a full service outage. The evicted pod is not automatically restarted in-place — the ReplicaSet controller creates a new pod, which may land on the same saturated node and get evicted again in a loop.
The Attack Vector / Blast Radius
Ephemeral storage is consumed from three sources the kubelet tracks:
- Container overlay (writable layer): Every
docker/containerdwrite that isn't to a mounted volume hits the node's local disk. - Container logs:
stdout/stderrfrom your container are written to/var/log/pods/on the node. A chatty logger with no log rotation will silently eat gigabytes. emptyDirvolumes: Scratch space mounted into the pod. UnlesssizeLimitis set, this is unbounded and counts toward the pod's ephemeral storage usage.
Cascading failure scenario:
- Node disk fills → kubelet triggers soft eviction (warning) → if not resolved, hard eviction fires → all pods on that node without guaranteed QoS for ephemeral storage are candidates for eviction, ordered by usage-over-request ratio.
- A single log-spewing pod can evict unrelated, well-behaved pods on the same node. This is a noisy neighbor problem at the infrastructure layer.
- If the node is part of a small cluster (2–3 nodes), cascading evictions across nodes under load can collapse your entire deployment.
- DaemonSets are not immune. A DaemonSet pod evicted for storage pressure will be recreated immediately — and evicted again — creating a tight eviction loop that hammers the kubelet.
How to Fix It (The Solution)
Basic Fix: Set Ephemeral Storage Requests and Limits
The single most impactful change. Without limits.ephemeral-storage, the kubelet has no per-container guardrail and must resort to node-level eviction.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
template:
spec:
containers:
- name: my-app
image: my-app:1.4.2
resources:
requests:
cpu: "250m"
memory: "256Mi"
+ ephemeral-storage: "500Mi"
limits:
cpu: "500m"
memory: "512Mi"
+ ephemeral-storage: "1Gi"
For emptyDir volumes used as scratch space:
volumes:
- name: scratch
emptyDir:
- {}
+ sizeLimit: "512Mi"
⚠️ When
limits.ephemeral-storageis set, the kubelet will evict the specific offending container rather than the whole node reaching threshold. This is strictly better — surgical eviction vs. collateral damage.
Enterprise Best Practice: LimitRange + ResourceQuota + Structured Logging
Step 1: Enforce defaults cluster-wide with a LimitRange
This ensures any pod deployed without explicit ephemeral-storage limits still gets a sane default. No more zero-request containers.
+apiVersion: v1
+kind: LimitRange
+metadata:
+ name: ephemeral-storage-defaults
+ namespace: production
+spec:
+ limits:
+ - type: Container
+ default:
+ ephemeral-storage: "1Gi"
+ defaultRequest:
+ ephemeral-storage: "256Mi"
+ max:
+ ephemeral-storage: "4Gi"
Step 2: Cap namespace-level total ephemeral storage with ResourceQuota
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+ name: ephemeral-storage-quota
+ namespace: production
+spec:
+ hard:
+ requests.ephemeral-storage: "20Gi"
+ limits.ephemeral-storage: "40Gi"
Step 3: Fix the root cause — eliminate unbounded log writes
If your containers are logging to stdout at high volume, configure log rotation at the container runtime level (containerd/docker) and switch to structured JSON logging with a log shipper (Fluentd, Vector) that reads from the node and forwards off-disk:
# /etc/containerd/config.toml (node-level)
[plugins."io.containerd.grpc.v1.cri".containerd]
- # No log rotation configured
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
+ [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
+ # Handled via kubelet config below
# /var/lib/kubelet/config.yaml
+containerLogMaxSize: "50Mi"
+containerLogMaxFiles: 3
Step 4: Tune kubelet eviction thresholds (if you manage nodes directly)
# /var/lib/kubelet/config.yaml
evictionHard:
- nodefs.available: "10%"
+ nodefs.available: "15%"
+ nodefs.inodesFree: "5%"
evictionSoft:
+ nodefs.available: "20%"
evictionSoftGracePeriod:
+ nodefs.available: "1m30s"
evictionMaxPodGracePeriod: 90
The soft eviction threshold gives pods a grace period to flush and terminate cleanly before hard eviction fires.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Block zero-limit deployments with OPA/Gatekeeper
Deploy a ConstraintTemplate that rejects any pod spec missing ephemeral-storage limits:
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: requireephemeralstoragelimits
spec:
crd:
spec:
names:
kind: RequireEphemeralStorageLimits
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package requireephemeralstoragelimits
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not container.resources.limits["ephemeral-storage"]
msg := sprintf("Container '%v' must have ephemeral-storage limits set.", [container.name])
}
2. Shift-left with Checkov in your pipeline
# .github/workflows/k8s-lint.yaml
- name: Checkov Kubernetes Scan
uses: bridgecrewio/checkov-action@master
with:
directory: ./k8s/
check: CKV_K8S_20,CKV_K8S_11 # resource limits + requests checks
framework: kubernetes
soft_fail: false
CKV_K8S_11 specifically checks for memory/CPU limits. Add a custom Checkov policy for ephemeral storage:
# checkov/custom_checks/ephemeral_storage_limit.py
from checkov.common.models.enums import CheckResult, CheckCategories
from checkov.kubernetes.checks.resource.base_container_check import BaseK8Check
class EphemeralStorageLimitCheck(BaseK8Check):
def __init__(self):
name = "Ensure ephemeral-storage limit is set"
id = "CKV2_K8S_CUSTOM_1"
categories = [CheckCategories.GENERAL_SECURITY]
super().__init__(name=name, id=id, categories=categories,
supported_entities=['containers'])
def scan_resource_conf(self, conf):
limits = conf.get("resources", {}).get("limits", {})
if limits.get("ephemeral-storage"):
return CheckResult.PASSED
return CheckResult.FAILED
scanner = EphemeralStorageLimitCheck()
3. Monitor with Prometheus alerts before eviction fires
# prometheus-rules.yaml
groups:
- name: ephemeral-storage
rules:
- alert: NodeEphemeralStoragePressureWarning
expr: |
kubelet_node_name * on(node) group_left()
(1 - (node_filesystem_avail_bytes{mountpoint="/"}
/ node_filesystem_size_bytes{mountpoint="/"})) > 0.75
for: 5m
labels:
severity: warning
annotations:
summary: "Node {{ $labels.node }} ephemeral storage > 75% full"
- alert: PodEphemeralStorageNearLimit
expr: |
(kubelet_container_log_filesystem_used_bytes
/ on(pod, container, namespace)
kube_pod_container_resource_limits{resource="ephemeral-storage"}) > 0.8
for: 2m
labels:
severity: critical
annotations:
summary: "Container {{ $labels.container }} in {{ $labels.namespace }} at 80% ephemeral-storage limit"
This alert fires before the kubelet eviction threshold is hit, giving your on-call engineer time to act rather than react.