Initializing Enclave...

How to Find and Remove Dangling Docker Volumes Before They Kill Your Disk

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 5 mins

TL;DR

  • What broke: Docker volumes created by stopped/removed containers are not garbage-collected automatically — they accumulate indefinitely on the host filesystem, silently eating disk until /var/lib/docker fills the root partition.
  • How to fix it: Run docker volume prune for immediate relief; implement named volumes with explicit lifecycle management for the permanent fix.
  • Use our Client-Side Sandbox above to paste your docker system df -v output and auto-generate a scoped cleanup script and corrected Compose config.

The Incident (What does the error mean?)

You hit one of these:

No space left on device
Error response from daemon: mkdir /var/lib/docker/volumes/a3f9c.../data: no space left on device
Write failed: No space left on device

Or you ran df -h and saw:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   50G     0 100% /

Then you ran:

docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          34        12        18.4GB    11.2GB (60%)
Containers      8         3         142MB     98MB
Local Volumes   217       4         61.3GB    58.9GB (96%)
Build Cache     0         0         0B        0B

217 volumes. 4 active. 58.9 GB of orphaned data. This is the blast. The immediate consequence: any container write — logs, uploads, DB WAL files — fails with ENOSPC. Your running services start crashing.


The Attack Vector / Blast Radius

Docker's volume lifecycle is decoupled from the container lifecycle by design. When you docker rm a container, its anonymous volumes (-v /data) are NOT removed unless you explicitly pass -v:

# This leaves the volume orphaned:
docker rm my-postgres

# This cleans it up:
docker rm -v my-postgres

The cascading failure path:

  1. Dev/CI pipelines spin up containers with anonymous volumes (postgres, redis, build artifacts) and docker rm without -v.
  2. Each run leaks a new volume. On a busy CI runner, this is dozens per hour.
  3. Root partition fills. Docker daemon starts failing ALL write operations — not just the leaking service.
  4. Collateral damage: Every container on the host dies. Logging stops. Health checks fail. If this is a shared build node, your entire pipeline is down.
  5. Recovery requires manual intervention on the host — you cannot fix this from inside a container.

On Kubernetes with hostPath or local-path provisioner, the same pattern destroys the node and triggers pod eviction storms.


How to Fix It (The Solution)

Step 1: Immediate Triage

# See exactly what's eating space
docker system df -v

# List all dangling (unreferenced) volumes
docker volume ls -f dangling=true

# Find the biggest offenders
docker volume ls -q -f dangling=true | xargs docker volume inspect \
  --format '{{.Name}} {{.Mountpoint}}' | \
  while read name mp; do echo "$(du -sh $mp 2>/dev/null | cut -f1) $name"; done | sort -rh | head -20

Basic Fix — Nuclear Prune

# Remove ALL dangling volumes. Irreversible. Verify the list above first.
docker volume prune -f

# Full system cleanup (images, containers, networks, volumes)
docker system prune --volumes -f

Reclaim confirmed:

docker system df  # Verify reclaimable drops to near 0
df -h /          # Verify host disk freed

Enterprise Best Practice — Fix the Root Cause

The real fix is named volumes with explicit lifecycle in your Compose files, and never using anonymous volumes in production.

# docker-compose.yml

services:
  postgres:
    image: postgres:16
-   volumes:
-     - /var/lib/postgresql/data
+   volumes:
+     - postgres_data:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine
-   volumes:
-     - /data
+   volumes:
+     - redis_data:/data

+ volumes:
+   postgres_data:
+     name: myapp_postgres_data
+   redis_data:
+     name: myapp_redis_data

Why this matters: Named volumes are tracked. docker volume ls shows them with meaningful names. Your runbook can target them explicitly. Anonymous volumes (the - lines above) get a random SHA256 name and are guaranteed to be orphaned eventually.

For CI/CD runners where you want ephemeral volumes, use --rm on every docker run:

# .gitlab-ci.yml / GitHub Actions step

- docker run -v /data myapp:latest ./run-tests.sh
+ docker run --rm -v /data myapp:latest ./run-tests.sh

--rm triggers docker rm -v automatically on container exit, cleaning the anonymous volume immediately.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Scheduled Cron Prune on Build Nodes

# /etc/cron.d/docker-prune
# Runs at 2AM daily. Safe — only removes unreferenced volumes.
0 2 * * * root docker volume prune -f >> /var/log/docker-prune.log 2>&1

2. Disk Usage Alerting Before It's Critical

# Prometheus node_exporter covers this, but for a quick shell alert:
cat /etc/cron.d/disk-alert
*/15 * * * * root \
  USED=$(df / | awk 'NR==2{print $5}' | tr -d '%'); \
  [ "$USED" -gt 80 ] && \
  curl -s -X POST $SLACK_WEBHOOK -d "{\"text\":\"HOST $(hostname): root disk at ${USED}%\"}"

3. Hadolint + Checkov in Your Pipeline

Hadolint flags anonymous VOLUME declarations in Dockerfiles. Add it as a pre-commit hook:

# .pre-commit-config.yaml
- repo: https://github.com/hadolint/hadolint
  rev: v2.12.0
  hooks:
    - id: hadolint
      args: ['--failure-threshold', 'warning']

Checkov check CKV_DOCKER_6 specifically flags missing volume management in Compose files.

4. Docker Daemon Log Driver Limits

While not volumes, log files in /var/lib/docker/containers/ are another silent disk killer. Lock them down in /etc/docker/daemon.json:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Restart the daemon: systemctl restart docker

5. Kubernetes: Set StorageClass reclaimPolicy: Delete

If you're on K8s with dynamic provisioning, ensure PVCs are not left as Released:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
- reclaimPolicy: Retain
+ reclaimPolicy: Delete

Retain is safe for stateful production data. For ephemeral workloads, Delete ensures PVs are destroyed with the PVC.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →