How do I check if Flannel has a subnet lease conflict in etcd?

Use etcdctl with API v3: `ETCDCTL_API=3 etcdctl get /coreos.com/network/subnets/ --prefix --keys-only`. Compare the returned subnet entries against your current live nodes (`kubectl get nodes -o wide`). Any subnet entry whose node IP does not correspond to a Ready node is an orphaned lease that should be deleted with `etcdctl del`.

Is it safe to delete files in /var/lib/cni/networks/cbr0/?

Yes, but only after cordoning and draining the node and stopping the kubelet and container runtime first. These files are flat-file IPAM locks used by the host-local CNI plugin. Deleting them while pods are running will cause those pods to lose their IP registrations, leading to further sandbox errors. Always follow the sequence: cordon → drain → stop kubelet → stop containerd → delete CNI state → restart containerd → restart kubelet → uncordon.

Fixing 'Pod Sandbox Changed, It Will Be Killed and Re-Created' in Flannel CNI: Pod IP Collision Troubleshooting Guide

Q: What causes 'pod sandbox changed, it will be killed and re-created' in Flannel?

The kubelet detects that the network namespace (sandbox) assigned to a running pod no longer matches what the CNI plugin reports. In Flannel, this is almost always caused by a stale IP lock file in /var/lib/cni/networks/cbr0/ persisting after a node restart, causing Flannel's IPAM to re-assign an IP that is still registered in the local filesystem cache, creating a mismatch the kubelet cannot reconcile without destroying and rebuilding the sandbox.

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins depending on cluster size

TL;DR

What broke: Flannel assigned a pod IP that is already registered in etcd or the local subnet lease cache, causing kubelet to detect a sandbox network namespace mismatch and kill the pod in a restart loop.
How to fix it: Flush stale CNI state from /var/lib/cni/networks/cbr0 and /run/flannel/subnet.env on the affected node, force-delete the ghost pod, and reconcile the Flannel etcd lease prefix.
Shortcut: Use our Client-Side Sandbox above to paste your kubelet journal and Flannel ConfigMap — it will auto-identify the stale lease and generate the exact remediation commands without sending your config anywhere.

The Incident (What Does the Error Mean?)

Raw error from journalctl -u kubelet --since "10 min ago":

E0612 03:14:22.847291    1423 pod_workers.go:951] Error syncing pod "app-7d9f8b-xkqpl" (uid: "a1b2c3d4-..."), skipping:
  failed to "CreatePodSandbox" for "app-7d9f8b-xkqpl" with CreatePodSandboxError:
  "Failed to create sandbox for pod \"app-7d9f8b-xkqpl\": rpc error: code = Unknown
  desc = failed to set up sandbox container \"f3a9..\" network for pod \"app-7d9f8b-xkqpl\":
  networkPlugin cni failed to set up pod \"app-7d9f8b-xkqpl\" network:
  failed to allocate for range 0: no IP addresses available in range set: 10.244.3.0/24"

W0612 03:14:23.112004    1423 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
pod sandbox changed, it will be killed and re-created

Immediate consequence: The pod enters a CrashLoopBackOff / perpetual ContainerCreating state. If this is a DaemonSet or a critical workload, every pod on that node is potentially affected. The node's entire /24 subnet from Flannel's allocation may be flagged as exhausted even though IPs are phantom-held by dead containers.

The Attack Vector / Blast Radius

This is not a security exploit — it is a distributed state desynchronization failure with a wide blast radius:

Stale CNI lease files: When a node reboots ungracefully (OOM kill, EC2 spot interruption, bare-metal power loss), /var/lib/cni/networks/cbr0/ retains flat-file IP reservations. Flannel's IPAM reads these files on pod creation and believes those IPs are still allocated. The pod that previously held 10.244.3.47 is gone, but the lock file remains.
etcd subnet lease desync: Flannel stores node subnet leases in etcd under /coreos.com/network/subnets/. If a node's lease TTL expires but the node comes back online before etcd garbage-collects the entry, two nodes can be assigned overlapping subnets.
Cascading failure path:
- Node A holds stale lease → new pod gets duplicate IP → ARP conflict on the overlay network → all pods on that subnet lose connectivity, not just the colliding pod → liveness probes fail cluster-wide on that node → kubelet starts killing healthy pods → PodDisruptionBudget thresholds breached → service goes down.
Scale multiplier: In autoscaling clusters with rapid node churn (spot fleets, Karpenter), this condition can affect dozens of nodes simultaneously during a scale-in/scale-out event.

How to Fix It (The Solution)

Step 1: Identify the Affected Node and Subnet

# Find which node the crashing pod was scheduled on
kubectl get pod app-7d9f8b-xkqpl -o wide

# SSH to that node, check Flannel's assigned subnet
cat /run/flannel/subnet.env
# Expected output:
# FLANNEL_NETWORK=10.244.0.0/16
# FLANNEL_SUBNET=10.244.3.1/24
# FLANNEL_MTU=1450
# FLANNEL_IPMASQ=true

# Check stale CNI IP lock files
ls -la /var/lib/cni/networks/cbr0/
# You'll see files named after IPs e.g. 10.244.3.47 with a PID inside
# Cross-reference: if the PID is dead, it's a stale lock
cat /var/lib/cni/networks/cbr0/10.244.3.47

Step 2: Basic Fix — Flush Stale CNI State on the Node

# Cordon the node first — do NOT skip this
kubectl cordon <node-name>

# Drain non-daemonset pods
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data --force

# On the node itself (via SSH or kubectl debug):
sudo systemctl stop kubelet
sudo systemctl stop containerd  # or docker

# Nuke stale CNI allocations
sudo rm -rf /var/lib/cni/networks/cbr0/*
sudo rm -f /run/flannel/subnet.env

# Restart in order
sudo systemctl start containerd
sudo systemctl start kubelet

# Uncordon after kubelet reports Ready
kubectl uncordon <node-name>

Step 3: Enterprise Best Practice — Reconcile etcd Flannel Leases

If the issue is etcd-level subnet overlap (multi-node):

# List all Flannel subnet leases in etcd
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  get /coreos.com/network/subnets/ --prefix --keys-only

# Delete a stale/orphaned lease for a decommissioned node
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  del /coreos.com/network/subnets/10.244.3.0-24

Step 4: Flannel ConfigMap — Fix Subnet Size to Reduce Collision Probability

apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-flannel-cfg
  namespace: kube-flannel
data:
  net-conf.json: |
    {
-     "Network": "10.244.0.0/16",
-     "SubnetLen": 24,
-     "SubnetMin": "10.244.1.0",
-     "SubnetMax": "10.244.99.0",
+     "Network": "10.244.0.0/14",
+     "SubnetLen": 24,
+     "SubnetMin": "10.244.1.0",
+     "SubnetMax": "10.247.255.0",
      "Backend": {
-       "Type": "udp"
+       "Type": "vxlan",
+       "VNI": 1,
+       "Port": 8472,
+       "GBP": false,
+       "DirectRouting": false
      }
    }

Why: udp backend is deprecated and has known race conditions in lease assignment. vxlan with explicit VNI is the production-grade backend. Expanding from /16 to /14 gives 4x more subnet space, dramatically reducing exhaustion under aggressive autoscaling.

Step 5: Force-Delete the Ghost Pod

# Standard delete often hangs — use force
kubectl delete pod app-7d9f8b-xkqpl --grace-period=0 --force

# Verify the sandbox is actually gone on the node
sudo crictl pods | grep app-7d9f8b-xkqpl
# If still listed:
sudo crictl stopp <pod-id>
sudo crictl rmp <pod-id>

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. OPA/Gatekeeper Policy — Enforce vxlan Backend

package flannel.backend

deny[msg] {
  input.kind == "ConfigMap"
  input.metadata.name == "kube-flannel-cfg"
  config := json.unmarshal(input.data["net-conf.json"])
  config.Backend.Type != "vxlan"
  msg := sprintf("Flannel backend must be vxlan, got: %v", [config.Backend.Type])
}

2. Node Startup Script — Pre-flight CNI Cleanup

Add to your node bootstrap userdata (cloud-init / Launch Template):

# /etc/rc.local or systemd oneshot unit before kubelet starts
rm -rf /var/lib/cni/networks/cbr0/*
rm -f /run/flannel/subnet.env
echo "CNI state pre-flight cleanup complete" | logger -t flannel-preflight

3. Alerting — Detect Sandbox Churn Before Full Outage

# Prometheus alerting rule
groups:
- name: flannel-cni
  rules:
  - alert: FlannelSandboxRecreationSpike
    expr: |
      increase(kubelet_pod_worker_errors_total{
        error="CreatePodSandboxError"
      }[5m]) > 5
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "Node {{ $labels.node }} has >5 sandbox creation failures in 5 min"
      runbook: "https://your-wiki/flannel-ip-collision-runbook"

4. Checkov — Scan Flannel Manifests in GitOps Pipeline

# Add to your CI pipeline (GitHub Actions / GitLab CI)
checkov -d ./manifests/networking/ \
  --check CKV_K8S_28 \
  --compact \
  --output cli

# Custom check for Flannel network size
# Fail if Network CIDR is smaller than /14

5. Karpenter / Cluster Autoscaler — Graceful Node Termination

# Karpenter NodePool — ensure graceful drain before termination
apiVersion: karpenter.sh/v1beta1
kind: NodePool
spec:
  template:
    spec:
      terminationGracePeriod: 120s  # Give kubelet time to clean CNI state
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s

Without terminationGracePeriod, spot interruptions kill the node before kubelet can release CNI leases back to Flannel — this is the #1 root cause of this error in cloud environments.