Why does docker inspect hang only with Hyper-V isolation and not process isolation on Windows?

Process-isolated containers share the host kernel directly — there is no intermediate VM worker process. Hyper-V isolation spins up a lightweight Hyper-V VM per container via vmwp.exe, and all lifecycle calls go through the HCS named pipe. When vmwp.exe crashes or the VM compute system handle becomes invalid, hcsshim blocks waiting for a response that never comes. Process isolation has no such intermediary, so docker inspect returns immediately from kernel namespace metadata.

Is it safe to kill vmwp.exe directly with Stop-Process -Force?

Yes, for an already-crashed or orphaned vmwp.exe it is safe and is the correct remediation. The VM is already in an invalid state — that is the root cause of the hang. Force-killing it releases the HCS handle, unblocks the daemon goroutine, and allows Docker to clean up the container state. You will lose any unsaved in-memory state in that container, but if docker inspect is hanging, the container is already non-functional.

How do I prevent this from taking down an entire Kubernetes Windows node?

Deploy node-problem-detector with a custom HCS deadlock rule that matches 'hcsshim.*handle is invalid' in the Docker daemon logs. Configure it to apply a node taint (node.kubernetes.io/unreachable or a custom taint) when the condition fires. Pair this with a DaemonSet watchdog that restarts the Docker/vmcompute service stack. Upgrading to containerd 1.7+ as the CRI runtime on Windows nodes eliminates this failure mode entirely by replacing the hcsshim v1 code path.

How to Fix docker inspect Hanging Indefinitely on Windows with Hyper-V Isolation

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 10–30 mins

TL;DR

docker inspect <container> blocks forever because the Docker daemon is stuck waiting on a response from the Windows Host Compute Service (HCS) — the Hyper-V VM worker process (vmwp.exe) is either deadlocked, crashed, or orphaned.
Kill the orphaned vmwp.exe process tied to the container's VM ID, restart the HCS service, and optionally force-remove the container from daemon state.
Use our Client-Side Sandbox below to paste your daemon.json and HCS error logs and auto-generate the refactored config and remediation script.

The Incident (What Does the Error Mean?)

You run:

docker inspect 3f9a1c2b8d04

The shell hangs. No output. No timeout. Ctrl+C returns you to the prompt but the daemon is still blocked internally. Meanwhile:

docker ps       # also hangs
docker stop     # also hangs
docker rm -f    # also hangs

Checking the Docker daemon log at C:\ProgramData\Docker\panic.log or via Get-EventLog reveals:

time="2024-11-12T03:17:42Z" level=error msg="hcsshim::Container Wait" 
error="the handle is invalid." id="3f9a1c2b8d04"

time="2024-11-12T03:17:42Z" level=warning msg="failed to wait for container" 
container=3f9a1c2b8d04 error="hcsshim: the handle is invalid"

ERROR: context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Immediate consequence: The Docker daemon's goroutine pool is blocked on a synchronous HCS named pipe call that will never return. Every subsequent docker CLI command that touches container state queues behind this deadlock. Your entire Docker host is effectively frozen for container operations.

The Attack Vector / Blast Radius

This is not a security exploit — it is a cascading availability failure with the following blast radius:

Why HCS deadlocks: Each Hyper-V isolated container runs inside a lightweight VM. The Docker daemon communicates with these VMs via hcsshim over a Windows named pipe (\\.\pipe\docker_engine). When the VM worker process (vmwp.exe) crashes mid-operation — during a snapshot, a checkpoint, or a network namespace teardown — hcsshim holds an open handle to a now-invalid VM compute system. The daemon's container.Wait() call blocks on that handle forever. There is no default timeout in older Docker Engine versions on Windows.

Cascading failure chain:

One orphaned vmwp.exe → daemon goroutine blocked
Daemon goroutine pool exhaustion → ALL docker CLI commands time out
Orchestrator (Kubernetes kubelet, Swarm agent) health checks fail → node marked NotReady
Workloads rescheduled to other nodes → potential overload on remaining nodes
If this is a CI/CD runner node, all pipelines queue indefinitely

In Kubernetes on Windows nodes, this manifests as the kubelet unable to garbage-collect terminated pods, leading to disk pressure from unremoved container layers on the C:\ volume.

How to Fix It

Step 1 — Identify the Orphaned VM Worker Process

Get the container's VM ID from HCS directly:

# List all HCS compute systems (requires elevated PowerShell)
Get-ComputeProcess | Select-Object Id, SystemType, Owner

If hcsdiag is available (Windows Server 2019+):

hcsdiag list

Cross-reference the container ID prefix with the VM GUID. Then find the vmwp.exe PID:

Get-Process vmwp | Select-Object Id, StartTime
# Kill the specific orphaned one — match by start time or handle count of 0
Stop-Process -Id <PID> -Force

Step 2 — Basic Fix: Restart HCS and Docker

Stop-Service docker -Force
Stop-Service vmcompute -Force   # HCS
Start-Service vmcompute
Start-Service docker

If containers are still listed as running after restart:

docker rm -f $(docker ps -aq)

Step 3 — Enterprise Best Practice: Daemon Timeout + Named Pipe Hardening

The root fix is enforcing HCS call timeouts and enabling container shutdown timeouts in daemon.json:

# C:\ProgramData\Docker\config\daemon.json
{
-  "hosts": ["npipe://"]
+  "hosts": ["npipe://"],
+  "shutdown-timeout": 15,
+  "exec-opts": ["isolation=hyperv"],
+  "log-level": "warn",
+  "debug": false
}

For the hcsshim timeout, set the environment variable on the Docker service:

# In the Docker service environment (via sc.exe or registry)
- # No timeout set — blocks forever
+ HCSSHIM_TIMEOUT_SECONDS=120

Apply via PowerShell service config:

# Add environment variable to Docker service
$regPath = "HKLM:\SYSTEM\CurrentControlSet\Services\docker"
$existing = (Get-ItemProperty $regPath).Environment
# Append timeout
Set-ItemProperty $regPath -Name Environment -Value ($existing + "HCSSHIM_TIMEOUT_SECONDS=120")
Restart-Service docker

Step 4 — Force-Remove Stuck Container from Daemon State

If the container persists in docker ps after HCS restart:

# Stop daemon
Stop-Service docker -Force

# Manually remove container state directory
$containerId = "3f9a1c2b8d04"
Remove-Item "C:\ProgramData\Docker\containers\$containerId" -Recurse -Force

# Restart
Start-Service docker

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing daemon.json or HCS error log into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Watchdog Script on Windows Nodes

Deploy a scheduled task that detects and recovers from HCS deadlocks:

# watchdog.ps1 — run every 2 minutes via Task Scheduler
$timeout = 10  # seconds
$job = Start-Job { docker ps --no-trunc }
if (-not (Wait-Job $job -Timeout $timeout)) {
    Remove-Job $job -Force
    Write-EventLog -LogName Application -Source "DockerWatchdog" -EntryType Warning -EventId 1001 -Message "Docker daemon hung. Recycling HCS."
    Stop-Service docker -Force
    Stop-Service vmcompute -Force
    Start-Service vmcompute
    Start-Service docker
}

2. Kubernetes Node Problem Detector

If running Windows nodes in AKS or self-managed Kubernetes, deploy node-problem-detector with a custom rule:

# npd-config.json rule
{
  "type": "permanent",
  "condition": "DockerHung",
  "reason": "HCSDeadlock",
  "message": "docker inspect hang detected — HCS named pipe unresponsive",
  "pattern": "hcsshim.*handle is invalid"
}

This triggers a node taint automatically, draining workloads before the node fully freezes.

3. CI/CD Pipeline Gate (GitHub Actions / Azure DevOps)

Add a pre-flight check before any Windows container job:

- name: Verify Docker daemon responsive
  shell: pwsh
  run: |
    $job = Start-Job { docker version }
    if (-not (Wait-Job $job -Timeout 15)) {
      Remove-Job $job -Force
      Write-Error "Docker daemon unresponsive on Windows runner. Failing fast."
      exit 1
    }
    Receive-Job $job

4. Upgrade Path

Docker Engine 24.0+ on Windows includes improved HCS handle cleanup on daemon restart.
containerd 1.7+ as the Windows runtime (replacing the legacy docker-containerd) has significantly better Hyper-V VM lifecycle management and does not exhibit this named pipe deadlock pattern.
Migrate to containerd + nerdctl on Windows nodes where possible — this eliminates the hcsshim v1 code path entirely.