How to Fix Docker Devicemapper Thin Pool Full: No Space Left on Device
Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–45 mins
TL;DR
- What broke: Docker's devicemapper thin pool — the block-level storage backend for all container layers and volumes — hit 100% capacity. Every container attempting a write is now returning
no space left on device. Running containers are crashing or hung. - How to fix it: Immediately reclaim space by removing dead containers/images, then either extend the thin pool LV or migrate to
overlay2. Long-term, never run devicemapper in loopback mode in production. - Use our Client-Side Sandbox above to paste your
docker infoanddmsetup statusoutput and auto-generate the correcteddaemon.jsonand LVM extension commands.
The Incident (What Does the Error Mean?)
Raw error surface — what you see in journalctl -u docker or container stderr:
devicemapper: Error running deviceCreate (CreateSnapDeviceRaw) dm_task_run failed
Error response from daemon: –-storage-opt is supported only for overlay over xfs with 'pquota' mount option
write /var/lib/docker/devicemapper/devicemapper/data: no space left on device
thin pool full: no space left on device
And from dmsetup status:
docker-thinpool: 0 209715200 thin-pool 56798/524288 4387626/1638400 - rw discard_passdown queue_if_no_space
^^^^^^^^^^^^^^^^^^^^^
data blocks used/total — this is at ~100%
Immediate consequence: Docker daemon cannot allocate new blocks for any container write operation. Containers mid-write get ENOSPC. New container starts fail. Image pulls fail. If your orchestrator (Swarm, ECS) is health-checking, it will start cycling containers into a crash loop, compounding the problem.
The Attack Vector / Blast Radius
This is a cascading storage starvation failure. Here's the kill chain:
- Thin pool data space exhausted → all container layer writes blocked.
- Docker daemon begins logging errors → if log driver writes to the same volume, the daemon itself can hang.
- Orchestrator detects unhealthy containers → triggers restarts → new container filesystem allocation attempts → all fail → restart loop burns CPU.
- Metadata exhaustion (separate but equally fatal): if
dm.thinpoolmetasizeis undersized, metadata blocks fill independently. You can have free data space and still be completely locked. - Host-level impact: If you're running devicemapper in loopback mode (the default when no explicit
dm.thinpooldevis set), the loop files live under/var/lib/dockeron your root partition. Root partition full = SSH login issues, cron failures, kernel OOM events.
The loopback anti-pattern is the root cause in 80% of production incidents. It was never supported for production use. Docker's own documentation deprecated it. If docker info | grep 'Data loop file' returns a path, you are in this failure mode.
How to Fix It (The Solution)
Step 0: Triage — Identify What's Full
# Check thin pool data AND metadata usage
dmsetup status | grep thin
# Check host-level disk
df -h /var/lib/docker
# Full Docker storage report
docker info 2>/dev/null | grep -A 20 'Storage Driver'
Output fields to read: Data Space Used / Data Space Total and Metadata Space Used / Metadata Space Total.
Basic Fix: Emergency Space Reclamation
# 1. Kill and remove all stopped containers
docker container prune -f
# 2. Remove dangling images (untagged, unreferenced layers)
docker image prune -f
# 3. Nuclear option if this is a worker node you can drain
docker system prune -af --volumes
# 4. Force devicemapper to return freed blocks to the pool
dmsetup message docker-thinpool 0 'trim'
If you're on LVM-backed thin pool (correct setup), extend the pool immediately:
# Identify your VG
vgs
# Add 20G to the thin pool data LV
lvextend -L +20G /dev/vg_docker/docker-thinpool
# Verify — Docker picks this up without restart
dmsetup status docker-thinpool
Enterprise Best Practice: Migrate Off Devicemapper to overlay2
Devicemapper is EOL for Docker production workloads. The correct fix is migration.
Before (broken loopback devicemapper config):
- {
- "storage-driver": "devicemapper"
- }
After (overlay2 on XFS with project quotas — production grade):
+ {
+ "storage-driver": "overlay2",
+ "storage-opts": [
+ "overlay2.override_kernel_check=true"
+ ],
+ "log-driver": "json-file",
+ "log-opts": {
+ "max-size": "50m",
+ "max-file": "3"
+ },
+ "data-root": "/var/lib/docker"
+ }
If you must stay on devicemapper (RHEL 7 kernel requirement), use direct-lvm, not loopback:
- {
- "storage-driver": "devicemapper"
- }
+ {
+ "storage-driver": "devicemapper",
+ "storage-opts": [
+ "dm.thinpooldev=/dev/mapper/docker-thinpool",
+ "dm.use_deferred_removal=true",
+ "dm.use_deferred_deletion=true",
+ "dm.fs=xfs",
+ "dm.basesize=20G"
+ ]
+ }
Provision the LVM thin pool before starting Docker:
# Assumes /dev/sdb is a dedicated disk for Docker
pvcreate /dev/sdb
vgcreate docker /dev/sdb
# 95% data, 1% metadata, leave 4% for auto-extend headroom
lvcreate --wipesignatures y -n thinpool docker -l 95%VG
lvcreate --wipesignatures y -n thinpoolmeta docker -l 1%VG
lvconvert -y \
--zero n \
-c 512K \
--thinpool docker/thinpool \
--poolmetadata docker/thinpoolmeta
# Configure LVM auto-extend — THIS IS THE CRITICAL PREVENTION STEP
cat > /etc/lvm/profile/docker-thinpool.profile <<EOF
activation {
thin_pool_autoextend_threshold=80
thin_pool_autoextend_percent=20
}
EOF
lvchange --metadataprofile docker-thinpool docker/thinpool
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. LVM Auto-Extend (Already Shown Above — Do This First)
The thin_pool_autoextend_threshold=80 profile means LVM automatically grows the pool at 80% capacity. This alone prevents most production incidents.
2. Prometheus + Alertmanager: Alert Before It's Full
# prometheus alert rule
groups:
- name: docker_storage
rules:
- alert: DockerThinPoolDataHigh
expr: |
(node_device_mapper_data_used_bytes / node_device_mapper_data_total_bytes) > 0.75
for: 5m
labels:
severity: warning
annotations:
summary: "Docker thin pool data at {{ $value | humanizePercentage }}"
- alert: DockerThinPoolMetadataHigh
expr: |
(node_device_mapper_metadata_used_bytes / node_device_mapper_metadata_total_bytes) > 0.60
for: 5m
labels:
severity: critical
annotations:
summary: "Docker thin pool METADATA critical — pool lock imminent"
3. Checkov / Ansible Lint: Enforce overlay2 in IaC
If you're provisioning Docker hosts via Ansible:
# tasks/docker.yml — enforce overlay2, fail fast on devicemapper loopback
- name: Assert overlay2 storage driver
assert:
that:
- docker_storage_driver == 'overlay2'
fail_msg: "FAIL: devicemapper loopback is not permitted in production. Set docker_storage_driver=overlay2."
For Terraform-provisioned EC2 instances with user-data, add a startup validation:
# In user-data, fail loud if someone shipped the wrong AMI/config
DRIVER=$(docker info --format '{{.Driver}}')
if [[ "$DRIVER" == "devicemapper" ]]; then
echo "FATAL: devicemapper detected. Halting node registration." | systemd-cat -p err
systemctl stop docker
exit 1
fi
4. Node-Level Cron: Scheduled Prune on Worker Nodes
# /etc/cron.d/docker-prune
0 3 * * * root docker system prune -f --filter "until=24h" >> /var/log/docker-prune.log 2>&1
Do not use --volumes in automated prune. You will delete persistent data.