How to Fix 'No Space Left on Device' During Docker Build on overlay2 (90% Disk Usage)
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 10–30 mins
TL;DR
- What broke:
docker build --no-cacheexhausted the host filesystem. overlay2 accumulates unreferenced layer tarballs, dangling images, and BuildKit's content-addressable cache in/var/lib/docker—--no-cachemakes it worse by bypassing layer reuse and writing fresh layers every run. - How to fix it: Run
docker system prune -af --volumes+ purge BuildKit cache withdocker builder prune -af, then reclaim/var/lib/docker/overlay2orphans manually if needed. - Fast path: Use our Client-Side Sandbox below to auto-refactor your Dockerfile and CI pipeline config to prevent this from recurring.
The Incident (What Does the Error Mean?)
Raw error during docker build --no-cache -t myapp:latest .:
Step 14/22 : RUN npm ci
no space left on device
error building image: error building stage: failed to solve with frontend dockerfile.v0:
failed to read dockerfile: failed to create temp dir: no space left on device
or from BuildKit:
=> ERROR [build 7/9] RUN apt-get update && apt-get install -y build-essential
------
> [build 7/9] RUN apt-get update && apt-get install -y build-essential:
------
executor failed running [/bin/sh -c apt-get update ...]: exit code: 1
Error: error from sender: open /proc/self/fd/...: no space left on device
Immediate consequence: The build daemon cannot allocate new overlay2 upper-dir layers. The host's root or /var/lib/docker mountpoint is at 100% capacity. Every subsequent docker command that writes to disk — including docker pull — will fail. If this is a CI runner, the agent itself may crash.
The Attack Vector / Blast Radius
This is not a one-time spike. It is a compounding failure mode:
--no-cacheforces full layer rewrite on every build. Each run writes a complete new set of overlay2 layers. Without cache reuse, old intermediate layers are not referenced but are not immediately GC'd.- BuildKit's content-addressable store (
/var/lib/docker/buildkit/) is never automatically purged. It accumulates blobs, snapshots, and metadata across every build invocation. - Dangling images (
<none>:<none>) from prior failed or replaced builds consume gigabytes silently.docker images -f dangling=trueroutinely shows 10–40 GB on active CI hosts. - Container log files under
/var/lib/docker/containers/<id>/are unbounded by default. A single chatty container can consume the remaining free space. - Cascading failure: Once the disk hits 100%,
containerdanddockerdthemselves can crash, taking down all running containers on the host — including unrelated production workloads if this is a shared node.
On Kubernetes nodes using overlay2, this triggers DiskPressure taint, evicting pods cluster-wide.
How to Fix It
Step 1 — Diagnose First (30 seconds)
# Where is space going?
df -h /var/lib/docker
docker system df -v
# Find the biggest offenders
du -sh /var/lib/docker/overlay2
du -sh /var/lib/docker/buildkit
du -sh /var/lib/docker/containers
# Dangling images
docker images -f dangling=true --format '{{.Size}}' | sort -rh | head
Basic Fix — Emergency Reclaim
# Nuclear option — removes ALL unused images, containers, networks, volumes
docker system prune -af --volumes
# Purge BuildKit cache specifically (often 5–20 GB)
docker builder prune -af
# Truncate container logs (replace <container_id> or loop all)
truncate -s 0 /var/lib/docker/containers/*/*-json.log
# Verify
df -h /var/lib/docker
Enterprise Best Practice — Dockerfile Refactor + Daemon Config
Problem: Fat single-stage Dockerfile bloating layer count
- FROM node:20
- COPY . .
- RUN npm install
- RUN npm run build
- RUN apt-get update && apt-get install -y curl git vim
+ FROM node:20-alpine AS builder
+ WORKDIR /app
+ COPY package*.json ./
+ RUN npm ci --omit=dev
+ COPY . .
+ RUN npm run build
+
+ FROM node:20-alpine AS runtime
+ WORKDIR /app
+ COPY --from=builder /app/dist ./dist
+ COPY --from=builder /app/node_modules ./node_modules
+ # No dev tools, no source, no cache debris in final image
Problem: No log rotation or BuildKit GC limits configured in daemon.json
# /etc/docker/daemon.json
{
- "storage-driver": "overlay2"
+ "storage-driver": "overlay2",
+ "log-driver": "json-file",
+ "log-opts": {
+ "max-size": "10m",
+ "max-file": "3"
+ },
+ "builder": {
+ "gc": {
+ "enabled": true,
+ "defaultKeepStorage": "10GB",
+ "policy": [
+ { "keepStorage": "10GB", "filter": ["unused-for=48h"] },
+ { "keepStorage": "5GB", "all": true }
+ ]
+ }
+ }
}
Apply with: systemctl restart docker
Problem: CI pipeline runs --no-cache unconditionally
- docker build --no-cache -t myapp:$CI_COMMIT_SHA .
+ docker build \
+ --cache-from myapp:latest \
+ --build-arg BUILDKIT_INLINE_CACHE=1 \
+ -t myapp:$CI_COMMIT_SHA \
+ -t myapp:latest .
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Add a pre-build disk check to your pipeline
# .gitlab-ci.yml / GitHub Actions equivalent
before_script:
- |
USAGE=$(df /var/lib/docker | awk 'NR==2 {print $5}' | tr -d '%')
if [ "$USAGE" -gt 80 ]; then
echo "[WARN] Docker disk at ${USAGE}%. Running prune."
docker system prune -f
docker builder prune -f --keep-storage 5GB
fi
2. Enforce image size limits with a Dockerfile linter
# Hadolint — catches layer-bloating anti-patterns before they hit CI
docker run --rm -i hadolint/hadolint < Dockerfile
# Dockle — CIS benchmark + size audit
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
goodwithtech/dockle myapp:latest
3. OPA/Conftest policy — block builds without multi-stage
# policy/dockerfile.rego
package dockerfile
deny[msg] {
input[i].Cmd == "from"
count([s | input[s].Cmd == "from"]) == 1
input[j].Cmd == "run"
contains(input[j].Value[0], "apt-get install")
msg := "Single-stage build with apt installs detected. Use multi-stage to reduce layer bloat."
}
4. Mount /var/lib/docker on a dedicated volume
On EC2/GCP/Azure, never let Docker share the root EBS volume. Mount a dedicated 100GB+ gp3 volume at /var/lib/docker. Set CloudWatch/Prometheus alerts at 70% and 85% utilization — not 95%.
# Alert threshold in Prometheus
- alert: DockerDiskPressure
expr: (node_filesystem_avail_bytes{mountpoint="/var/lib/docker"} / node_filesystem_size_bytes{mountpoint="/var/lib/docker"}) < 0.20
for: 5m
labels:
severity: warning
annotations:
summary: "Docker storage below 20% free on {{ $labels.instance }}"