How to Fix docker inspect Hanging Indefinitely on Windows with Hyper-V Isolation
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 10–30 mins
TL;DR
docker inspect <container>blocks forever because the Docker daemon is stuck waiting on a response from the Windows Host Compute Service (HCS) — the Hyper-V VM worker process (vmwp.exe) is either deadlocked, crashed, or orphaned.- Kill the orphaned
vmwp.exeprocess tied to the container's VM ID, restart the HCS service, and optionally force-remove the container from daemon state. - Use our Client-Side Sandbox below to paste your
daemon.jsonand HCS error logs and auto-generate the refactored config and remediation script.
The Incident (What Does the Error Mean?)
You run:
docker inspect 3f9a1c2b8d04
The shell hangs. No output. No timeout. Ctrl+C returns you to the prompt but the daemon is still blocked internally. Meanwhile:
docker ps # also hangs
docker stop # also hangs
docker rm -f # also hangs
Checking the Docker daemon log at C:\ProgramData\Docker\panic.log or via Get-EventLog reveals:
time="2024-11-12T03:17:42Z" level=error msg="hcsshim::Container Wait"
error="the handle is invalid." id="3f9a1c2b8d04"
time="2024-11-12T03:17:42Z" level=warning msg="failed to wait for container"
container=3f9a1c2b8d04 error="hcsshim: the handle is invalid"
ERROR: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Immediate consequence: The Docker daemon's goroutine pool is blocked on a synchronous HCS named pipe call that will never return. Every subsequent docker CLI command that touches container state queues behind this deadlock. Your entire Docker host is effectively frozen for container operations.
The Attack Vector / Blast Radius
This is not a security exploit — it is a cascading availability failure with the following blast radius:
Why HCS deadlocks: Each Hyper-V isolated container runs inside a lightweight VM. The Docker daemon communicates with these VMs via hcsshim over a Windows named pipe (\\.\pipe\docker_engine). When the VM worker process (vmwp.exe) crashes mid-operation — during a snapshot, a checkpoint, or a network namespace teardown — hcsshim holds an open handle to a now-invalid VM compute system. The daemon's container.Wait() call blocks on that handle forever. There is no default timeout in older Docker Engine versions on Windows.
Cascading failure chain:
- One orphaned
vmwp.exe→ daemon goroutine blocked - Daemon goroutine pool exhaustion → ALL docker CLI commands time out
- Orchestrator (Kubernetes kubelet, Swarm agent) health checks fail → node marked
NotReady - Workloads rescheduled to other nodes → potential overload on remaining nodes
- If this is a CI/CD runner node, all pipelines queue indefinitely
In Kubernetes on Windows nodes, this manifests as the kubelet unable to garbage-collect terminated pods, leading to disk pressure from unremoved container layers on the C:\ volume.
How to Fix It
Step 1 — Identify the Orphaned VM Worker Process
Get the container's VM ID from HCS directly:
# List all HCS compute systems (requires elevated PowerShell)
Get-ComputeProcess | Select-Object Id, SystemType, Owner
If hcsdiag is available (Windows Server 2019+):
hcsdiag list
Cross-reference the container ID prefix with the VM GUID. Then find the vmwp.exe PID:
Get-Process vmwp | Select-Object Id, StartTime
# Kill the specific orphaned one — match by start time or handle count of 0
Stop-Process -Id <PID> -Force
Step 2 — Basic Fix: Restart HCS and Docker
Stop-Service docker -Force
Stop-Service vmcompute -Force # HCS
Start-Service vmcompute
Start-Service docker
If containers are still listed as running after restart:
docker rm -f $(docker ps -aq)
Step 3 — Enterprise Best Practice: Daemon Timeout + Named Pipe Hardening
The root fix is enforcing HCS call timeouts and enabling container shutdown timeouts in daemon.json:
# C:\ProgramData\Docker\config\daemon.json
{
- "hosts": ["npipe://"]
+ "hosts": ["npipe://"],
+ "shutdown-timeout": 15,
+ "exec-opts": ["isolation=hyperv"],
+ "log-level": "warn",
+ "debug": false
}
For the hcsshim timeout, set the environment variable on the Docker service:
# In the Docker service environment (via sc.exe or registry)
- # No timeout set — blocks forever
+ HCSSHIM_TIMEOUT_SECONDS=120
Apply via PowerShell service config:
# Add environment variable to Docker service
$regPath = "HKLM:\SYSTEM\CurrentControlSet\Services\docker"
$existing = (Get-ItemProperty $regPath).Environment
# Append timeout
Set-ItemProperty $regPath -Name Environment -Value ($existing + "HCSSHIM_TIMEOUT_SECONDS=120")
Restart-Service docker
Step 4 — Force-Remove Stuck Container from Daemon State
If the container persists in docker ps after HCS restart:
# Stop daemon
Stop-Service docker -Force
# Manually remove container state directory
$containerId = "3f9a1c2b8d04"
Remove-Item "C:\ProgramData\Docker\containers\$containerId" -Recurse -Force
# Restart
Start-Service docker
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing
daemon.jsonor HCS error log into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Watchdog Script on Windows Nodes
Deploy a scheduled task that detects and recovers from HCS deadlocks:
# watchdog.ps1 — run every 2 minutes via Task Scheduler
$timeout = 10 # seconds
$job = Start-Job { docker ps --no-trunc }
if (-not (Wait-Job $job -Timeout $timeout)) {
Remove-Job $job -Force
Write-EventLog -LogName Application -Source "DockerWatchdog" -EntryType Warning -EventId 1001 -Message "Docker daemon hung. Recycling HCS."
Stop-Service docker -Force
Stop-Service vmcompute -Force
Start-Service vmcompute
Start-Service docker
}
2. Kubernetes Node Problem Detector
If running Windows nodes in AKS or self-managed Kubernetes, deploy node-problem-detector with a custom rule:
# npd-config.json rule
{
"type": "permanent",
"condition": "DockerHung",
"reason": "HCSDeadlock",
"message": "docker inspect hang detected — HCS named pipe unresponsive",
"pattern": "hcsshim.*handle is invalid"
}
This triggers a node taint automatically, draining workloads before the node fully freezes.
3. CI/CD Pipeline Gate (GitHub Actions / Azure DevOps)
Add a pre-flight check before any Windows container job:
- name: Verify Docker daemon responsive
shell: pwsh
run: |
$job = Start-Job { docker version }
if (-not (Wait-Job $job -Timeout 15)) {
Remove-Job $job -Force
Write-Error "Docker daemon unresponsive on Windows runner. Failing fast."
exit 1
}
Receive-Job $job
4. Upgrade Path
- Docker Engine 24.0+ on Windows includes improved HCS handle cleanup on daemon restart.
- containerd 1.7+ as the Windows runtime (replacing the legacy
docker-containerd) has significantly better Hyper-V VM lifecycle management and does not exhibit this named pipe deadlock pattern. - Migrate to containerd + nerdctl on Windows nodes where possible — this eliminates the
hcsshimv1 code path entirely.