How to Find and Remove Dangling Docker Volumes Before They Kill Your Disk
Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 5 mins
TL;DR
- What broke: Docker volumes created by stopped/removed containers are not garbage-collected automatically — they accumulate indefinitely on the host filesystem, silently eating disk until
/var/lib/dockerfills the root partition. - How to fix it: Run
docker volume prunefor immediate relief; implement named volumes with explicit lifecycle management for the permanent fix. - Use our Client-Side Sandbox above to paste your
docker system df -voutput and auto-generate a scoped cleanup script and corrected Compose config.
The Incident (What does the error mean?)
You hit one of these:
No space left on device
Error response from daemon: mkdir /var/lib/docker/volumes/a3f9c.../data: no space left on device
Write failed: No space left on device
Or you ran df -h and saw:
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 50G 0 100% /
Then you ran:
docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 34 12 18.4GB 11.2GB (60%)
Containers 8 3 142MB 98MB
Local Volumes 217 4 61.3GB 58.9GB (96%)
Build Cache 0 0 0B 0B
217 volumes. 4 active. 58.9 GB of orphaned data. This is the blast. The immediate consequence: any container write — logs, uploads, DB WAL files — fails with ENOSPC. Your running services start crashing.
The Attack Vector / Blast Radius
Docker's volume lifecycle is decoupled from the container lifecycle by design. When you docker rm a container, its anonymous volumes (-v /data) are NOT removed unless you explicitly pass -v:
# This leaves the volume orphaned:
docker rm my-postgres
# This cleans it up:
docker rm -v my-postgres
The cascading failure path:
- Dev/CI pipelines spin up containers with anonymous volumes (postgres, redis, build artifacts) and
docker rmwithout-v. - Each run leaks a new volume. On a busy CI runner, this is dozens per hour.
- Root partition fills. Docker daemon starts failing ALL write operations — not just the leaking service.
- Collateral damage: Every container on the host dies. Logging stops. Health checks fail. If this is a shared build node, your entire pipeline is down.
- Recovery requires manual intervention on the host — you cannot fix this from inside a container.
On Kubernetes with hostPath or local-path provisioner, the same pattern destroys the node and triggers pod eviction storms.
How to Fix It (The Solution)
Step 1: Immediate Triage
# See exactly what's eating space
docker system df -v
# List all dangling (unreferenced) volumes
docker volume ls -f dangling=true
# Find the biggest offenders
docker volume ls -q -f dangling=true | xargs docker volume inspect \
--format '{{.Name}} {{.Mountpoint}}' | \
while read name mp; do echo "$(du -sh $mp 2>/dev/null | cut -f1) $name"; done | sort -rh | head -20
Basic Fix — Nuclear Prune
# Remove ALL dangling volumes. Irreversible. Verify the list above first.
docker volume prune -f
# Full system cleanup (images, containers, networks, volumes)
docker system prune --volumes -f
Reclaim confirmed:
docker system df # Verify reclaimable drops to near 0
df -h / # Verify host disk freed
Enterprise Best Practice — Fix the Root Cause
The real fix is named volumes with explicit lifecycle in your Compose files, and never using anonymous volumes in production.
# docker-compose.yml
services:
postgres:
image: postgres:16
- volumes:
- - /var/lib/postgresql/data
+ volumes:
+ - postgres_data:/var/lib/postgresql/data
redis:
image: redis:7-alpine
- volumes:
- - /data
+ volumes:
+ - redis_data:/data
+ volumes:
+ postgres_data:
+ name: myapp_postgres_data
+ redis_data:
+ name: myapp_redis_data
Why this matters: Named volumes are tracked. docker volume ls shows them with meaningful names. Your runbook can target them explicitly. Anonymous volumes (the - lines above) get a random SHA256 name and are guaranteed to be orphaned eventually.
For CI/CD runners where you want ephemeral volumes, use --rm on every docker run:
# .gitlab-ci.yml / GitHub Actions step
- docker run -v /data myapp:latest ./run-tests.sh
+ docker run --rm -v /data myapp:latest ./run-tests.sh
--rm triggers docker rm -v automatically on container exit, cleaning the anonymous volume immediately.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Scheduled Cron Prune on Build Nodes
# /etc/cron.d/docker-prune
# Runs at 2AM daily. Safe — only removes unreferenced volumes.
0 2 * * * root docker volume prune -f >> /var/log/docker-prune.log 2>&1
2. Disk Usage Alerting Before It's Critical
# Prometheus node_exporter covers this, but for a quick shell alert:
cat /etc/cron.d/disk-alert
*/15 * * * * root \
USED=$(df / | awk 'NR==2{print $5}' | tr -d '%'); \
[ "$USED" -gt 80 ] && \
curl -s -X POST $SLACK_WEBHOOK -d "{\"text\":\"HOST $(hostname): root disk at ${USED}%\"}"
3. Hadolint + Checkov in Your Pipeline
Hadolint flags anonymous VOLUME declarations in Dockerfiles. Add it as a pre-commit hook:
# .pre-commit-config.yaml
- repo: https://github.com/hadolint/hadolint
rev: v2.12.0
hooks:
- id: hadolint
args: ['--failure-threshold', 'warning']
Checkov check CKV_DOCKER_6 specifically flags missing volume management in Compose files.
4. Docker Daemon Log Driver Limits
While not volumes, log files in /var/lib/docker/containers/ are another silent disk killer. Lock them down in /etc/docker/daemon.json:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
Restart the daemon: systemctl restart docker
5. Kubernetes: Set StorageClass reclaimPolicy: Delete
If you're on K8s with dynamic provisioning, ensure PVCs are not left as Released:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs
- reclaimPolicy: Retain
+ reclaimPolicy: Delete
Retain is safe for stateful production data. For ephemeral workloads, Delete ensures PVs are destroyed with the PVC.