Fixing 'Pod Sandbox Changed, It Will Be Killed and Re-Created' in Flannel CNI: Pod IP Collision Troubleshooting Guide
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins depending on cluster size
TL;DR
- What broke: Flannel assigned a pod IP that is already registered in etcd or the local subnet lease cache, causing kubelet to detect a sandbox network namespace mismatch and kill the pod in a restart loop.
- How to fix it: Flush stale CNI state from
/var/lib/cni/networks/cbr0and/run/flannel/subnet.envon the affected node, force-delete the ghost pod, and reconcile the Flannel etcd lease prefix. - Shortcut: Use our Client-Side Sandbox above to paste your kubelet journal and Flannel ConfigMap — it will auto-identify the stale lease and generate the exact remediation commands without sending your config anywhere.
The Incident (What Does the Error Mean?)
Raw error from journalctl -u kubelet --since "10 min ago":
E0612 03:14:22.847291 1423 pod_workers.go:951] Error syncing pod "app-7d9f8b-xkqpl" (uid: "a1b2c3d4-..."), skipping:
failed to "CreatePodSandbox" for "app-7d9f8b-xkqpl" with CreatePodSandboxError:
"Failed to create sandbox for pod \"app-7d9f8b-xkqpl\": rpc error: code = Unknown
desc = failed to set up sandbox container \"f3a9..\" network for pod \"app-7d9f8b-xkqpl\":
networkPlugin cni failed to set up pod \"app-7d9f8b-xkqpl\" network:
failed to allocate for range 0: no IP addresses available in range set: 10.244.3.0/24"
W0612 03:14:23.112004 1423 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
pod sandbox changed, it will be killed and re-created
Immediate consequence: The pod enters a CrashLoopBackOff / perpetual ContainerCreating state. If this is a DaemonSet or a critical workload, every pod on that node is potentially affected. The node's entire /24 subnet from Flannel's allocation may be flagged as exhausted even though IPs are phantom-held by dead containers.
The Attack Vector / Blast Radius
This is not a security exploit — it is a distributed state desynchronization failure with a wide blast radius:
Stale CNI lease files: When a node reboots ungracefully (OOM kill, EC2 spot interruption, bare-metal power loss),
/var/lib/cni/networks/cbr0/retains flat-file IP reservations. Flannel's IPAM reads these files on pod creation and believes those IPs are still allocated. The pod that previously held10.244.3.47is gone, but the lock file remains.etcd subnet lease desync: Flannel stores node subnet leases in etcd under
/coreos.com/network/subnets/. If a node's lease TTL expires but the node comes back online before etcd garbage-collects the entry, two nodes can be assigned overlapping subnets.Cascading failure path:
- Node A holds stale lease → new pod gets duplicate IP → ARP conflict on the overlay network → all pods on that subnet lose connectivity, not just the colliding pod → liveness probes fail cluster-wide on that node → kubelet starts killing healthy pods → PodDisruptionBudget thresholds breached → service goes down.
Scale multiplier: In autoscaling clusters with rapid node churn (spot fleets, Karpenter), this condition can affect dozens of nodes simultaneously during a scale-in/scale-out event.
How to Fix It (The Solution)
Step 1: Identify the Affected Node and Subnet
# Find which node the crashing pod was scheduled on
kubectl get pod app-7d9f8b-xkqpl -o wide
# SSH to that node, check Flannel's assigned subnet
cat /run/flannel/subnet.env
# Expected output:
# FLANNEL_NETWORK=10.244.0.0/16
# FLANNEL_SUBNET=10.244.3.1/24
# FLANNEL_MTU=1450
# FLANNEL_IPMASQ=true
# Check stale CNI IP lock files
ls -la /var/lib/cni/networks/cbr0/
# You'll see files named after IPs e.g. 10.244.3.47 with a PID inside
# Cross-reference: if the PID is dead, it's a stale lock
cat /var/lib/cni/networks/cbr0/10.244.3.47
Step 2: Basic Fix — Flush Stale CNI State on the Node
# Cordon the node first — do NOT skip this
kubectl cordon <node-name>
# Drain non-daemonset pods
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force
# On the node itself (via SSH or kubectl debug):
sudo systemctl stop kubelet
sudo systemctl stop containerd # or docker
# Nuke stale CNI allocations
sudo rm -rf /var/lib/cni/networks/cbr0/*
sudo rm -f /run/flannel/subnet.env
# Restart in order
sudo systemctl start containerd
sudo systemctl start kubelet
# Uncordon after kubelet reports Ready
kubectl uncordon <node-name>
Step 3: Enterprise Best Practice — Reconcile etcd Flannel Leases
If the issue is etcd-level subnet overlap (multi-node):
# List all Flannel subnet leases in etcd
ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
get /coreos.com/network/subnets/ --prefix --keys-only
# Delete a stale/orphaned lease for a decommissioned node
ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
del /coreos.com/network/subnets/10.244.3.0-24
Step 4: Flannel ConfigMap — Fix Subnet Size to Reduce Collision Probability
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-flannel-cfg
namespace: kube-flannel
data:
net-conf.json: |
{
- "Network": "10.244.0.0/16",
- "SubnetLen": 24,
- "SubnetMin": "10.244.1.0",
- "SubnetMax": "10.244.99.0",
+ "Network": "10.244.0.0/14",
+ "SubnetLen": 24,
+ "SubnetMin": "10.244.1.0",
+ "SubnetMax": "10.247.255.0",
"Backend": {
- "Type": "udp"
+ "Type": "vxlan",
+ "VNI": 1,
+ "Port": 8472,
+ "GBP": false,
+ "DirectRouting": false
}
}
Why: udp backend is deprecated and has known race conditions in lease assignment. vxlan with explicit VNI is the production-grade backend. Expanding from /16 to /14 gives 4x more subnet space, dramatically reducing exhaustion under aggressive autoscaling.
Step 5: Force-Delete the Ghost Pod
# Standard delete often hangs — use force
kubectl delete pod app-7d9f8b-xkqpl --grace-period=0 --force
# Verify the sandbox is actually gone on the node
sudo crictl pods | grep app-7d9f8b-xkqpl
# If still listed:
sudo crictl stopp <pod-id>
sudo crictl rmp <pod-id>
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. OPA/Gatekeeper Policy — Enforce vxlan Backend
package flannel.backend
deny[msg] {
input.kind == "ConfigMap"
input.metadata.name == "kube-flannel-cfg"
config := json.unmarshal(input.data["net-conf.json"])
config.Backend.Type != "vxlan"
msg := sprintf("Flannel backend must be vxlan, got: %v", [config.Backend.Type])
}
2. Node Startup Script — Pre-flight CNI Cleanup
Add to your node bootstrap userdata (cloud-init / Launch Template):
# /etc/rc.local or systemd oneshot unit before kubelet starts
rm -rf /var/lib/cni/networks/cbr0/*
rm -f /run/flannel/subnet.env
echo "CNI state pre-flight cleanup complete" | logger -t flannel-preflight
3. Alerting — Detect Sandbox Churn Before Full Outage
# Prometheus alerting rule
groups:
- name: flannel-cni
rules:
- alert: FlannelSandboxRecreationSpike
expr: |
increase(kubelet_pod_worker_errors_total{
error="CreatePodSandboxError"
}[5m]) > 5
for: 2m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.node }} has >5 sandbox creation failures in 5 min"
runbook: "https://your-wiki/flannel-ip-collision-runbook"
4. Checkov — Scan Flannel Manifests in GitOps Pipeline
# Add to your CI pipeline (GitHub Actions / GitLab CI)
checkov -d ./manifests/networking/ \
--check CKV_K8S_28 \
--compact \
--output cli
# Custom check for Flannel network size
# Fail if Network CIDR is smaller than /14
5. Karpenter / Cluster Autoscaler — Graceful Node Termination
# Karpenter NodePool — ensure graceful drain before termination
apiVersion: karpenter.sh/v1beta1
kind: NodePool
spec:
template:
spec:
terminationGracePeriod: 120s # Give kubelet time to clean CNI state
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
Without terminationGracePeriod, spot interruptions kill the node before kubelet can release CNI leases back to Flannel — this is the #1 root cause of this error in cloud environments.