Fixing Calico CrashLoopBackOff: 'failed to get IP from pool' on IPv6 Dual-Stack Kubernetes Clusters
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins
TL;DR
- What broke:
calico-nodecannot allocate IPv6 addresses because the IPv6IPPoolCRD is absent, disabled, or has ablockSizeincompatible with the cluster CIDR — all new pods on affected nodes are stuck inContainerCreating. - How to fix it: Create or patch the IPv6
IPPoolwith the correct/112block size, ensureFelixConfigurationhasipv6Support: true, and verifycalicoctl ipam checkreturns clean. - Sandbox: Use our Client-Side Sandbox below to auto-refactor your broken IPPool manifest without leaking your CIDRs to a third-party AI.
The Incident (What does the error mean?)
Raw log output from a crashing calico-node pod:
E1105 03:12:44.882341 1 ipam.go:312] Failed to get IP address from pool: no IPv6 pools found
E1105 03:12:44.882411 1 main.go:198] Error: failed to get IP from pool: no IPv6 pools found for node calico-node-xk9p2
FATAL calico/node: failed to start: failed to get IP from pool
Immediate consequence: Every calico-node pod on nodes that were assigned an IPv6 address during kubelet registration enters CrashLoopBackOff. The node never becomes Ready. Any workload pod scheduled to that node stays in ContainerCreating indefinitely. In a rolling upgrade or a fresh node-pool scale-out, this silently bricks your entire new node capacity.
The Attack Vector / Blast Radius
This is a scheduling dead-zone failure. The blast radius scales with your node count:
- Node-level:
calico-nodeDaemonSet pod crashes → node CNI is non-functional →kubeletreportsNetworkPlugin not ready→ node taints itselfnode.kubernetes.io/not-ready. - Workload-level: All pods scheduled to the affected node are blocked. Existing pods are NOT evicted immediately, but any restart (OOMKill, rolling deploy, HPA scale-up) on that node will fail to re-attach networking.
- Control-plane ripple: If the node running
corednsor a critical system pod is affected, DNS resolution cluster-wide degrades. - Dual-stack-specific trap: You enabled dual-stack (
--cluster-cidr=10.244.0.0/16,fd00::/48) inkube-controller-managerandkubelet, but Calico's IPAM is independent of Kubernetes IPAM. Calico requires its ownIPPoolCRD for each address family. Kubernetes knowing about your IPv6 CIDR means nothing tocalicoctl.
The most common triggers in production:
- Dual-stack enabled on the cluster after initial Calico install — the IPv6
IPPoolwas never created. blockSizeset to/128(single address per block) causing immediate exhaustion.ipipMode: Alwaysset on the IPv6 pool — IPIP is IPv4-only; Calico silently marks the pool invalid.disabled: trueleft in the IPPool manifest from a staging copy-paste.
How to Fix It
Step 1 — Confirm the diagnosis
# Check what pools Calico actually knows about
calicoctl get ippools -o wide
# If you see only an IPv4 pool, that's your problem.
# Also check Felix config
calicoctl get felixconfiguration default -o yaml | grep -i ipv6
Basic Fix — Create the missing IPv6 IPPool
- # No IPv6 IPPool exists — output of 'calicoctl get ippools':
- # NAME CIDR SELECTOR
- # default-ipv4-ippool 10.244.0.0/16 all()
+ # Apply this manifest: kubectl apply -f ipv6-ippool.yaml
+ apiVersion: projectcalico.org/v3
+ kind: IPPool
+ metadata:
+ name: default-ipv6-ippool
+ spec:
+ cidr: fd00::/112
+ blockSize: 122
+ ipipMode: Never
+ vxlanMode: Never
+ natOutgoing: false
+ disabled: false
+ nodeSelector: all()
⚠️ blockSize rule: For IPv6,
blockSizemust be between116and128. A/122gives 64 addresses per block — sane for most clusters. Never use/128in production.
Enterprise Best Practice — Full dual-stack IPPool + FelixConfiguration patch
- apiVersion: projectcalico.org/v3
- kind: FelixConfiguration
- metadata:
- name: default
- spec:
- logSeverityScreen: Info
- # ipv6Support field absent — defaults to false in Calico < 3.23
+ apiVersion: projectcalico.org/v3
+ kind: FelixConfiguration
+ metadata:
+ name: default
+ spec:
+ logSeverityScreen: Info
+ ipv6Support: true
+ bpfEnabled: false # set true only if you are on eBPF dataplane
+ routeTableRange:
+ min: 1
+ max: 250
- apiVersion: projectcalico.org/v3
- kind: IPPool
- metadata:
- name: default-ipv6-ippool
- spec:
- cidr: fd00::/48 # too large, blockSize will be miscalculated
- blockSize: 128 # single IP per block — exhausts immediately
- ipipMode: Always # FATAL: IPIP is IPv4-only
- disabled: true # copy-paste artifact from staging
+ apiVersion: projectcalico.org/v3
+ kind: IPPool
+ metadata:
+ name: default-ipv6-ippool
+ spec:
+ cidr: fd00::/112 # sized to match --cluster-cidr IPv6 range
+ blockSize: 122
+ ipipMode: Never # mandatory for IPv6
+ vxlanMode: Never # or 'CrossSubnet' if you need overlay on IPv6
+ natOutgoing: false # IPv6 uses SLAAC/NDP, not NAT
+ disabled: false
+ nodeSelector: all()
Step 3 — Force calico-node to re-evaluate after fix
# Restart the DaemonSet — do NOT delete individual pods manually in production
kubectl rollout restart daemonset/calico-node -n kube-system
# Watch recovery
kubectl rollout status daemonset/calico-node -n kube-system
# Verify IPAM health
calicoctl ipam check
calicoctl ipam show --show-blocks
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. OPA/Gatekeeper policy — block IPPool with ipipMode on IPv6
package calico.ippool
deny[msg] {
input.kind == "IPPool"
contains(input.spec.cidr, ":")
input.spec.ipipMode != "Never"
msg := sprintf("IPPool %v: ipipMode must be 'Never' for IPv6 CIDRs", [input.metadata.name])
}
deny[msg] {
input.kind == "IPPool"
contains(input.spec.cidr, ":")
input.spec.disabled == true
msg := sprintf("IPPool %v: IPv6 pool must not be disabled in production", [input.metadata.name])
}
2. Checkov custom check (Python)
# checkov/custom/check_calico_ipv6_pool.py
from checkov.common.models.enums import CheckResult, CheckCategories
from checkov.kubernetes.checks.resource.base_resource_check import BaseK8Check
class CalicoIPv6PoolCheck(BaseK8Check):
def __init__(self):
name = "Ensure Calico IPv6 IPPool has ipipMode=Never and is not disabled"
id = "CKV_K8S_CALICO_001"
super().__init__(name=name, id=id,
categories=[CheckCategories.NETWORKING],
supported_entities=["IPPool"])
def scan_resource_conf(self, conf):
cidr = conf.get("spec", {}).get("cidr", "")
if ":" not in cidr:
return CheckResult.PASSED
spec = conf.get("spec", {})
if spec.get("ipipMode", "Never") != "Never":
return CheckResult.FAILED
if spec.get("disabled", False):
return CheckResult.FAILED
return CheckResult.PASSED
3. Pre-flight in your Helm/Terraform pipeline
# Add to your cluster bootstrap script BEFORE calico-node DaemonSet is applied
IPV6_POOL=$(calicoctl get ippools -o json | jq '[.items[] | select(.spec.cidr | contains(":"))] | length')
if [ "$IPV6_POOL" -eq 0 ]; then
echo "FATAL: No IPv6 IPPool found. Apply ipv6-ippool.yaml before proceeding."
exit 1
fi
This pre-flight gate catches the missing pool before calico-node is rolled out, eliminating the CrashLoop entirely in automated provisioning pipelines.