What is the difference between thin pool data exhaustion and metadata exhaustion in devicemapper?

Data exhaustion means the actual block storage for container layers is full — writes fail immediately. Metadata exhaustion means the internal B-tree structures that track thin pool allocations are full. Metadata fills independently and at a much smaller absolute size (typically 1% of pool). You can have gigabytes of free data space and still hit a complete lockout from metadata exhaustion alone. Always monitor both. The Prometheus alert rules above cover both cases.

Can I extend a devicemapper thin pool without restarting Docker?

Yes, if you're using LVM direct-lvm. Run `lvextend -L +XG /dev/VG/thinpool` and Docker picks up the new size dynamically via dmsetup — no daemon restart required. If you're on loopback mode, you cannot extend without stopping Docker, resizing the loop file, and restarting. This is one of many reasons loopback is not production-viable.

Is devicemapper still supported in modern Docker and should I migrate to overlay2?

Devicemapper is deprecated in Docker Engine. overlay2 is the default and recommended storage driver for all modern Linux kernels (4.0+ for Docker, kernel 3.18+ minimum). On RHEL/CentOS 7 with kernel 3.10, overlay2 requires XFS with the ftype=1 mount option. If you are on any current OS (RHEL 8+, Ubuntu 20.04+, Amazon Linux 2), migrate to overlay2 immediately. The migration requires a full `docker system prune` and daemon restart — plan for a maintenance window on stateful nodes.

How to Fix Docker Devicemapper Thin Pool Full: No Space Left on Device

Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–45 mins

TL;DR

What broke: Docker's devicemapper thin pool — the block-level storage backend for all container layers and volumes — hit 100% capacity. Every container attempting a write is now returning no space left on device. Running containers are crashing or hung.
How to fix it: Immediately reclaim space by removing dead containers/images, then either extend the thin pool LV or migrate to overlay2. Long-term, never run devicemapper in loopback mode in production.
Use our Client-Side Sandbox above to paste your docker info and dmsetup status output and auto-generate the corrected daemon.json and LVM extension commands.

The Incident (What Does the Error Mean?)

Raw error surface — what you see in journalctl -u docker or container stderr:

devicemapper: Error running deviceCreate (CreateSnapDeviceRaw) dm_task_run failed
Error response from daemon: –-storage-opt is supported only for overlay over xfs with 'pquota' mount option
write /var/lib/docker/devicemapper/devicemapper/data: no space left on device
thin pool full: no space left on device

And from dmsetup status:

docker-thinpool: 0 209715200 thin-pool 56798/524288 4387626/1638400 - rw discard_passdown queue_if_no_space
                                                    ^^^^^^^^^^^^^^^^^^^^^
                                                    data blocks used/total — this is at ~100%

Immediate consequence: Docker daemon cannot allocate new blocks for any container write operation. Containers mid-write get ENOSPC. New container starts fail. Image pulls fail. If your orchestrator (Swarm, ECS) is health-checking, it will start cycling containers into a crash loop, compounding the problem.

The Attack Vector / Blast Radius

This is a cascading storage starvation failure. Here's the kill chain:

Thin pool data space exhausted → all container layer writes blocked.
Docker daemon begins logging errors → if log driver writes to the same volume, the daemon itself can hang.
Orchestrator detects unhealthy containers → triggers restarts → new container filesystem allocation attempts → all fail → restart loop burns CPU.
Metadata exhaustion (separate but equally fatal): if dm.thinpoolmetasize is undersized, metadata blocks fill independently. You can have free data space and still be completely locked.
Host-level impact: If you're running devicemapper in loopback mode (the default when no explicit dm.thinpooldev is set), the loop files live under /var/lib/docker on your root partition. Root partition full = SSH login issues, cron failures, kernel OOM events.

The loopback anti-pattern is the root cause in 80% of production incidents. It was never supported for production use. Docker's own documentation deprecated it. If docker info | grep 'Data loop file' returns a path, you are in this failure mode.

How to Fix It (The Solution)

Step 0: Triage — Identify What's Full

# Check thin pool data AND metadata usage
dmsetup status | grep thin

# Check host-level disk
df -h /var/lib/docker

# Full Docker storage report
docker info 2>/dev/null | grep -A 20 'Storage Driver'

Output fields to read: Data Space Used / Data Space Total and Metadata Space Used / Metadata Space Total.

Basic Fix: Emergency Space Reclamation

# 1. Kill and remove all stopped containers
docker container prune -f

# 2. Remove dangling images (untagged, unreferenced layers)
docker image prune -f

# 3. Nuclear option if this is a worker node you can drain
docker system prune -af --volumes

# 4. Force devicemapper to return freed blocks to the pool
dmsetup message docker-thinpool 0 'trim'

If you're on LVM-backed thin pool (correct setup), extend the pool immediately:

# Identify your VG
vgs

# Add 20G to the thin pool data LV
lvextend -L +20G /dev/vg_docker/docker-thinpool

# Verify — Docker picks this up without restart
dmsetup status docker-thinpool

Enterprise Best Practice: Migrate Off Devicemapper to overlay2

Devicemapper is EOL for Docker production workloads. The correct fix is migration.

Before (broken loopback devicemapper config):

- {
-   "storage-driver": "devicemapper"
- }

After (overlay2 on XFS with project quotas — production grade):

+ {
+   "storage-driver": "overlay2",
+   "storage-opts": [
+     "overlay2.override_kernel_check=true"
+   ],
+   "log-driver": "json-file",
+   "log-opts": {
+     "max-size": "50m",
+     "max-file": "3"
+   },
+   "data-root": "/var/lib/docker"
+ }

If you must stay on devicemapper (RHEL 7 kernel requirement), use direct-lvm, not loopback:

- {
-   "storage-driver": "devicemapper"
- }
+ {
+   "storage-driver": "devicemapper",
+   "storage-opts": [
+     "dm.thinpooldev=/dev/mapper/docker-thinpool",
+     "dm.use_deferred_removal=true",
+     "dm.use_deferred_deletion=true",
+     "dm.fs=xfs",
+     "dm.basesize=20G"
+   ]
+ }

Provision the LVM thin pool before starting Docker:

# Assumes /dev/sdb is a dedicated disk for Docker
pvcreate /dev/sdb
vgcreate docker /dev/sdb

# 95% data, 1% metadata, leave 4% for auto-extend headroom
lvcreate --wipesignatures y -n thinpool docker -l 95%VG
lvcreate --wipesignatures y -n thinpoolmeta docker -l 1%VG

lvconvert -y \
  --zero n \
  -c 512K \
  --thinpool docker/thinpool \
  --poolmetadata docker/thinpoolmeta

# Configure LVM auto-extend — THIS IS THE CRITICAL PREVENTION STEP
cat > /etc/lvm/profile/docker-thinpool.profile <<EOF
activation {
  thin_pool_autoextend_threshold=80
  thin_pool_autoextend_percent=20
}
EOF

lvchange --metadataprofile docker-thinpool docker/thinpool

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. LVM Auto-Extend (Already Shown Above — Do This First)

The thin_pool_autoextend_threshold=80 profile means LVM automatically grows the pool at 80% capacity. This alone prevents most production incidents.

2. Prometheus + Alertmanager: Alert Before It's Full

# prometheus alert rule
groups:
- name: docker_storage
  rules:
  - alert: DockerThinPoolDataHigh
    expr: |
      (node_device_mapper_data_used_bytes / node_device_mapper_data_total_bytes) > 0.75
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Docker thin pool data at {{ $value | humanizePercentage }}"
  - alert: DockerThinPoolMetadataHigh
    expr: |
      (node_device_mapper_metadata_used_bytes / node_device_mapper_metadata_total_bytes) > 0.60
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Docker thin pool METADATA critical — pool lock imminent"

3. Checkov / Ansible Lint: Enforce overlay2 in IaC

If you're provisioning Docker hosts via Ansible:

# tasks/docker.yml — enforce overlay2, fail fast on devicemapper loopback
- name: Assert overlay2 storage driver
  assert:
    that:
      - docker_storage_driver == 'overlay2'
    fail_msg: "FAIL: devicemapper loopback is not permitted in production. Set docker_storage_driver=overlay2."

For Terraform-provisioned EC2 instances with user-data, add a startup validation:

# In user-data, fail loud if someone shipped the wrong AMI/config
DRIVER=$(docker info --format '{{.Driver}}')
if [[ "$DRIVER" == "devicemapper" ]]; then
  echo "FATAL: devicemapper detected. Halting node registration." | systemd-cat -p err
  systemctl stop docker
  exit 1
fi

4. Node-Level Cron: Scheduled Prune on Worker Nodes

# /etc/cron.d/docker-prune
0 3 * * * root docker system prune -f --filter "until=24h" >> /var/log/docker-prune.log 2>&1

Do not use --volumes in automated prune. You will delete persistent data.