What exactly counts as ephemeral storage in Kubernetes?

Ephemeral storage in Kubernetes includes three things the kubelet tracks against a pod's allocation: (1) the container's writable overlay layer — any files written inside the container filesystem that aren't on a mounted PersistentVolume; (2) container log files written to /var/log/pods/ on the host node via stdout/stderr; and (3) emptyDir volumes, unless backed by memory (medium: Memory). PersistentVolumeClaims (PVCs) are explicitly excluded and do NOT count toward ephemeral storage limits.

Will setting ephemeral-storage limits prevent my pod from being evicted?

Setting limits changes the eviction behavior from node-level to container-level. With no limits set, the kubelet evicts entire pods based on node disk pressure thresholds. With limits set, the kubelet evicts the specific container that exceeds its limit, which is more surgical. However, if the node itself hits hard eviction thresholds (e.g., nodefs.available < 10%), pods can still be evicted regardless of per-container limits — which is why you also need to tune kubelet eviction thresholds and set up Prometheus alerts to catch saturation before it hits the hard threshold.

How do I find which container or process is consuming all the ephemeral storage?

Run 'kubectl describe node ' to see the current ephemeral storage allocation and usage per pod. Then run 'kubectl exec -it -- df -h' to see disk usage inside the container, and 'du -sh /var/log/pods/ /' on the node itself to see log file sizes. For a cluster-wide view, use 'kubectl top pods --sort-by=ephemeral-storage' if metrics-server is installed, or query Prometheus with 'kubelet_container_log_filesystem_used_bytes' grouped by pod and namespace to identify the top offenders.

How to Fix Kubernetes Pod Eviction Due to Low Ephemeral Storage (With Limits & LimitRange)

Q: How do I find which container or process is consuming all the ephemeral storage?

Run 'kubectl describe node ' to see the current ephemeral storage allocation and usage per pod. Then run 'kubectl exec -it -- df -h' to see disk usage inside the container, and 'du -sh /var/log/pods/ /' on the node itself to see log file sizes. For a cluster-wide view, use 'kubectl top pods --sort-by=ephemeral-storage' if metrics-server is installed, or query Prometheus with 'kubelet_container_log_filesystem_used_bytes' grouped by pod and namespace to identify the top offenders.

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

What broke: Kubernetes evicted your pod because the node's ephemeral storage (local disk used for container logs, writable layers, and emptyDir volumes) hit the eviction threshold — your containers had no ephemeral-storage limits set, so the kubelet had no guardrails and killed the pod to protect the node.
How to fix it: Set resources.requests.ephemeral-storage and resources.limits.ephemeral-storage on every container in the pod spec, cap any emptyDir volumes with sizeLimit, and tune kubelet eviction thresholds at the node level.
Use our Client-Side Sandbox above to paste your failing pod/deployment YAML and auto-generate the refactored spec with correct ephemeral storage constraints applied.

The Incident (What Does the Error Mean?)

The raw event from kubectl describe pod <pod-name>:

Status:    Failed
Reason:    Evicted
Message:   The node was low on resource: ephemeral-storage.
           Threshold quantity: 10%, available: 8%.
           Container <container-name> was using 4Gi, request is 0.

And from kubectl get events --field-selector reason=Evicted:

WARNING   Evicted   pod/my-app-7d9f8b-xk2p1   
The node was low on resource: ephemeral-storage. 
Threshold quantity: 10%, available: 8%.

Immediate consequence: The pod is terminated with reason: Evicted and will not be rescheduled onto the same node until the storage pressure clears. If your PodDisruptionBudget is misconfigured or your deployment has replicas: 1, this is a full service outage. The evicted pod is not automatically restarted in-place — the ReplicaSet controller creates a new pod, which may land on the same saturated node and get evicted again in a loop.

The Attack Vector / Blast Radius

Ephemeral storage is consumed from three sources the kubelet tracks:

Container overlay (writable layer): Every docker/containerd write that isn't to a mounted volume hits the node's local disk.
Container logs: stdout/stderr from your container are written to /var/log/pods/ on the node. A chatty logger with no log rotation will silently eat gigabytes.
emptyDir volumes: Scratch space mounted into the pod. Unless sizeLimit is set, this is unbounded and counts toward the pod's ephemeral storage usage.

Cascading failure scenario:

Node disk fills → kubelet triggers soft eviction (warning) → if not resolved, hard eviction fires → all pods on that node without guaranteed QoS for ephemeral storage are candidates for eviction, ordered by usage-over-request ratio.
A single log-spewing pod can evict unrelated, well-behaved pods on the same node. This is a noisy neighbor problem at the infrastructure layer.
If the node is part of a small cluster (2–3 nodes), cascading evictions across nodes under load can collapse your entire deployment.
DaemonSets are not immune. A DaemonSet pod evicted for storage pressure will be recreated immediately — and evicted again — creating a tight eviction loop that hammers the kubelet.

How to Fix It (The Solution)

Basic Fix: Set Ephemeral Storage Requests and Limits

The single most impactful change. Without limits.ephemeral-storage, the kubelet has no per-container guardrail and must resort to node-level eviction.

 apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: my-app
 spec:
   replicas: 3
   template:
     spec:
       containers:
       - name: my-app
         image: my-app:1.4.2
         resources:
           requests:
             cpu: "250m"
             memory: "256Mi"
+            ephemeral-storage: "500Mi"
           limits:
             cpu: "500m"
             memory: "512Mi"
+            ephemeral-storage: "1Gi"

For emptyDir volumes used as scratch space:

 volumes:
 - name: scratch
   emptyDir:
-    {}
+    sizeLimit: "512Mi"

⚠️ When limits.ephemeral-storage is set, the kubelet will evict the specific offending container rather than the whole node reaching threshold. This is strictly better — surgical eviction vs. collateral damage.

Enterprise Best Practice: LimitRange + ResourceQuota + Structured Logging

Step 1: Enforce defaults cluster-wide with a LimitRange

This ensures any pod deployed without explicit ephemeral-storage limits still gets a sane default. No more zero-request containers.

+apiVersion: v1
+kind: LimitRange
+metadata:
+  name: ephemeral-storage-defaults
+  namespace: production
+spec:
+  limits:
+  - type: Container
+    default:
+      ephemeral-storage: "1Gi"
+    defaultRequest:
+      ephemeral-storage: "256Mi"
+    max:
+      ephemeral-storage: "4Gi"

Step 2: Cap namespace-level total ephemeral storage with ResourceQuota

+apiVersion: v1
+kind: ResourceQuota
+metadata:
+  name: ephemeral-storage-quota
+  namespace: production
+spec:
+  hard:
+    requests.ephemeral-storage: "20Gi"
+    limits.ephemeral-storage: "40Gi"

Step 3: Fix the root cause — eliminate unbounded log writes

If your containers are logging to stdout at high volume, configure log rotation at the container runtime level (containerd/docker) and switch to structured JSON logging with a log shipper (Fluentd, Vector) that reads from the node and forwards off-disk:

 # /etc/containerd/config.toml (node-level)
 [plugins."io.containerd.grpc.v1.cri".containerd]
-  # No log rotation configured
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
+  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
+    # Handled via kubelet config below

 # /var/lib/kubelet/config.yaml
+containerLogMaxSize: "50Mi"
+containerLogMaxFiles: 3

Step 4: Tune kubelet eviction thresholds (if you manage nodes directly)

 # /var/lib/kubelet/config.yaml
 evictionHard:
-  nodefs.available: "10%"
+  nodefs.available: "15%"
+  nodefs.inodesFree: "5%"
 evictionSoft:
+  nodefs.available: "20%"
 evictionSoftGracePeriod:
+  nodefs.available: "1m30s"
 evictionMaxPodGracePeriod: 90

The soft eviction threshold gives pods a grace period to flush and terminate cleanly before hard eviction fires.

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Block zero-limit deployments with OPA/Gatekeeper

Deploy a ConstraintTemplate that rejects any pod spec missing ephemeral-storage limits:

apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: requireephemeralstoragelimits
spec:
  crd:
    spec:
      names:
        kind: RequireEphemeralStorageLimits
  targets:
  - target: admission.k8s.gatekeeper.sh
    rego: |
      package requireephemeralstoragelimits
      violation[{"msg": msg}] {
        container := input.review.object.spec.containers[_]
        not container.resources.limits["ephemeral-storage"]
        msg := sprintf("Container '%v' must have ephemeral-storage limits set.", [container.name])
      }

2. Shift-left with Checkov in your pipeline

# .github/workflows/k8s-lint.yaml
- name: Checkov Kubernetes Scan
  uses: bridgecrewio/checkov-action@master
  with:
    directory: ./k8s/
    check: CKV_K8S_20,CKV_K8S_11  # resource limits + requests checks
    framework: kubernetes
    soft_fail: false

CKV_K8S_11 specifically checks for memory/CPU limits. Add a custom Checkov policy for ephemeral storage:

# checkov/custom_checks/ephemeral_storage_limit.py
from checkov.common.models.enums import CheckResult, CheckCategories
from checkov.kubernetes.checks.resource.base_container_check import BaseK8Check

class EphemeralStorageLimitCheck(BaseK8Check):
    def __init__(self):
        name = "Ensure ephemeral-storage limit is set"
        id = "CKV2_K8S_CUSTOM_1"
        categories = [CheckCategories.GENERAL_SECURITY]
        super().__init__(name=name, id=id, categories=categories,
                         supported_entities=['containers'])

    def scan_resource_conf(self, conf):
        limits = conf.get("resources", {}).get("limits", {})
        if limits.get("ephemeral-storage"):
            return CheckResult.PASSED
        return CheckResult.FAILED

scanner = EphemeralStorageLimitCheck()

3. Monitor with Prometheus alerts before eviction fires

# prometheus-rules.yaml
groups:
- name: ephemeral-storage
  rules:
  - alert: NodeEphemeralStoragePressureWarning
    expr: |
      kubelet_node_name * on(node) group_left()
      (1 - (node_filesystem_avail_bytes{mountpoint="/"} 
            / node_filesystem_size_bytes{mountpoint="/"})) > 0.75
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Node {{ $labels.node }} ephemeral storage > 75% full"
  - alert: PodEphemeralStorageNearLimit
    expr: |
      (kubelet_container_log_filesystem_used_bytes 
       / on(pod, container, namespace) 
       kube_pod_container_resource_limits{resource="ephemeral-storage"}) > 0.8
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "Container {{ $labels.container }} in {{ $labels.namespace }} at 80% ephemeral-storage limit"

This alert fires before the kubelet eviction threshold is hit, giving your on-call engineer time to act rather than react.