Why does 'df -h' show free space but Docker still says 'no space left on device'?

`df -h` reports block (byte) usage. Docker's overlay2 error is about inode exhaustion, which is a separate resource. Run `df -i /var/lib/docker` to see inode usage. If IUse% is 100%, you've hit the inode limit regardless of how many gigabytes are free. ext4 pre-allocates a fixed number of inodes at format time; once they're gone, no new files or directories can be created on that filesystem.

How do I find which containers or images are consuming the most inodes?

Run `find /var/lib/docker/overlay2 -xdev -printf '%h\n' | sort | uniq -c | sort -rn | head -30` to identify the overlay2 layer directories with the highest file counts. Cross-reference the layer hash with `docker inspect ` or `docker image inspect ` to identify the offending container or image. Images built with many separate `RUN` instructions are the most common culprit.

Should I use XFS or ext4 for Docker hosts in production?

XFS is the correct choice for Docker hosts with high container churn. XFS allocates inodes dynamically from free space rather than pre-allocating a fixed table at format time, making inode exhaustion before block exhaustion effectively impossible under normal workloads. Red Hat, AWS ECS-optimized AMIs, and the Docker documentation all recommend XFS with `ftype=1` (required for overlay2 d_type support). Format with `mkfs.xfs -n ftype=1 /dev/xvdb`.

How to Fix Docker Overlay2 Inode Exhaustion: No Space Left on Device

Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

What broke: The filesystem hosting /var/lib/docker/overlay2 has zero free inodes. Docker cannot create new layer directories, so all docker run, docker build, and container restarts fail with no space left on device even though df -h shows available disk space.
How to fix it: Prune dead overlay2 layers, dangling images, and stopped containers immediately; long-term, reformat the partition with a higher inode density or migrate to xfs with dynamic inode allocation.
CTA: Use our Client-Side Sandbox above to paste your docker info and df -i output — it will auto-diagnose whether you're inode-exhausted or block-exhausted and generate the corrected daemon.json.

The Incident (What Does the Error Mean?)

Raw error surface — you'll see this in dockerd logs, journalctl -u docker, or directly in your CI runner output:

error creating overlay mount to /var/lib/docker/overlay2/b3f9.../merged: 
  too many open files in system

Error response from daemon: 
  failed to create rootfs: 
  mkdir /var/lib/docker/overlay2/c7a1.../ : 
  no space left on device

[containerd] failed to reserve snapshot: 
  write /var/lib/docker/overlay2/...: 
  no space left on device

The trap: df -h returns 40% free. Engineers waste 20 minutes looking at disk blocks. The real culprit is inodes:

$ df -i /var/lib/docker
Filesystem      Inodes   IUsed  IFree IUse% Mounted on
/dev/xvda1     1310720 1310720      0  100% /

IFree: 0. Every overlay2 layer directory, every file inside a container layer, every whiteout file consumes one inode. On a default ext4 partition formatted with mkfs.ext4 defaults (~1 inode per 16KB), a host running hundreds of containers with thousands of small files will hit inode exhaustion long before block exhaustion.

Immediate consequence: No new containers start. Running containers that need to write new files to their writable layer may also begin failing. docker build aborts mid-layer. Kubernetes nodes report NodeNotReady if the container runtime cannot create pod sandboxes.

The Attack Vector / Blast Radius

This is a cascading infrastructure failure, not an isolated Docker issue.

Blast radius chain:

dockerd cannot mkdir new overlay2 merge directories → all docker run calls return non-zero exit codes.
Kubernetes kubelet CRI calls to containerd fail → pods stuck in ContainerCreating indefinitely.
Liveness/readiness probes on existing pods begin failing if those containers attempt filesystem writes (logging, temp files, pid files).
CI/CD pipelines (GitHub Actions self-hosted, GitLab Runner, Jenkins agents) fail every job — your entire deployment pipeline is down.
Log rotation daemons (logrotate, fluentd) fail to create new log files → logs lost or processes crash.
If this host is a Kubernetes node, the node taints itself node.kubernetes.io/disk-pressure and the scheduler stops placing new pods on it — but existing pods may also be evicted depending on eviction thresholds.

Why it's insidious: Monitoring alerts on disk block usage (the standard) will show green. Only an explicit df -i check or an inode-specific alert catches this. Most teams don't have it.

How to Fix It (The Solution)

Step 0: Confirm It's Inodes, Not Blocks

# Check block usage
df -h /var/lib/docker

# Check inode usage — this is the one that's 100%
df -i /var/lib/docker

# Find the top inode consumers under overlay2
find /var/lib/docker/overlay2 -xdev -printf '%h\n' | sort | uniq -c | sort -rn | head -20

Basic Fix: Emergency Inode Reclamation

# 1. Remove all stopped containers (each container = multiple overlay2 dirs)
docker container prune -f

# 2. Remove dangling images (untagged layers)
docker image prune -f

# 3. Nuclear option if you can afford downtime — removes ALL unused objects
docker system prune -a -f --volumes

# 4. Verify inodes freed
df -i /var/lib/docker

# 5. If dockerd is still wedged, restart it
systemctl restart docker

⚠️ docker system prune -a removes ALL images not referenced by a running container. Only run this if you can re-pull images. On a CI runner, this is usually safe.

Enterprise Best Practice: Prevent Recurrence

Option A — Reformat with higher inode density (ext4)

This requires migrating /var/lib/docker to a dedicated partition:

- mkfs.ext4 /dev/xvdb
+ mkfs.ext4 -N 10000000 /dev/xvdb
# -N sets absolute inode count. For a 100GB volume, default is ~6.5M.
# Set 10M+ for Docker hosts with high container churn.

Option B — Use XFS (recommended for Docker hosts)

XFS allocates inodes dynamically from free space. It does not pre-allocate a fixed inode table.

- mkfs.ext4 /dev/xvdb
+ mkfs.xfs /dev/xfs
# XFS grows its inode table dynamically. Inode exhaustion before block
# exhaustion is practically impossible under normal Docker workloads.

Update /etc/fstab and move your Docker data:

mkfs.xfs /dev/xvdb
mount /dev/xvdb /mnt/docker-new
systemctl stop docker
rsync -aHAX /var/lib/docker/ /mnt/docker-new/
umount /mnt/docker-new
# Update /etc/fstab to mount /dev/xvdb at /var/lib/docker
systemctl start docker

Option C — Tune daemon.json to limit layer accumulation

 {
-  "storage-driver": "overlay2"
+  "storage-driver": "overlay2",
+  "storage-opts": [
+    "overlay2.size=20G"
+  ],
+  "log-driver": "json-file",
+  "log-opts": {
+    "max-size": "10m",
+    "max-file": "3"
+  }
 }

Note: overlay2.size requires the backing filesystem to support d_type (XFS with ftype=1 or ext4). This limits each container's writable layer size, preventing runaway writes that generate massive small-file counts.

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Add Inode Monitoring to Your Alerting Stack

Prometheus + node_exporter (this is the one alert most teams are missing):

# alerts/docker-inodes.yaml
groups:
  - name: inode_exhaustion
    rules:
      - alert: FilesystemInodesExhaustionWarning
        expr: node_filesystem_files_free{mountpoint="/var/lib/docker"} / node_filesystem_files{mountpoint="/var/lib/docker"} < 0.15
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Docker overlay2 inode usage above 85% on {{ $labels.instance }}"

      - alert: FilesystemInodesExhaustionCritical
        expr: node_filesystem_files_free{mountpoint="/var/lib/docker"} / node_filesystem_files{mountpoint="/var/lib/docker"} < 0.05
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Docker overlay2 inode exhaustion imminent on {{ $labels.instance }}"

2. Scheduled Prune in CI/CD Runners

For self-hosted GitHub Actions runners or GitLab Runner hosts, add a cron job:

# /etc/cron.d/docker-prune
0 2 * * * root docker system prune -f --filter "until=24h" >> /var/log/docker-prune.log 2>&1

3. Terraform / Packer: Enforce XFS at Provisioning Time

-data_device_type = "ext4"
+data_device_type = "xfs"
 # In your Packer AMI builder or Terraform aws_instance user_data,
 # always format the Docker data volume as XFS.

4. Checkov / OPA Policy (Kubernetes)

Enforce that DaemonSets running on Docker nodes include inode monitoring sidecars or that node pools use XFS-formatted instance types via a custom OPA Gatekeeper constraint:

# opa/policies/docker_node_filesystem.rego
package kubernetes.admission

deny[msg] {
  input.request.kind.kind == "NodePool"
  not input.request.object.spec.config.diskType == "pd-ssd-xfs"
  msg := "NodePool must use XFS-formatted SSD to prevent overlay2 inode exhaustion"
}

5. Dockerfile Best Practices to Reduce Inode Pressure

-RUN apt-get update
-RUN apt-get install -y curl
-RUN apt-get clean
+# Collapse RUN commands — each RUN creates a new layer with its own inode tree
+RUN apt-get update && \
+    apt-get install -y curl && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*

Each RUN instruction in a Dockerfile creates a separate overlay2 layer directory. Collapsing them reduces the total inode footprint per image by 60–80% in package-heavy images.