Initializing Enclave...

Fixing Calico CrashLoopBackOff: 'failed to get IP from pool' on IPv6 Dual-Stack Kubernetes Clusters

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins

TL;DR

  • What broke: calico-node cannot allocate IPv6 addresses because the IPv6 IPPool CRD is absent, disabled, or has a blockSize incompatible with the cluster CIDR — all new pods on affected nodes are stuck in ContainerCreating.
  • How to fix it: Create or patch the IPv6 IPPool with the correct /112 block size, ensure FelixConfiguration has ipv6Support: true, and verify calicoctl ipam check returns clean.
  • Sandbox: Use our Client-Side Sandbox below to auto-refactor your broken IPPool manifest without leaking your CIDRs to a third-party AI.

The Incident (What does the error mean?)

Raw log output from a crashing calico-node pod:

E1105 03:12:44.882341       1 ipam.go:312] Failed to get IP address from pool: no IPv6 pools found
E1105 03:12:44.882411       1 main.go:198] Error: failed to get IP from pool: no IPv6 pools found for node calico-node-xk9p2
FATAL calico/node: failed to start: failed to get IP from pool

Immediate consequence: Every calico-node pod on nodes that were assigned an IPv6 address during kubelet registration enters CrashLoopBackOff. The node never becomes Ready. Any workload pod scheduled to that node stays in ContainerCreating indefinitely. In a rolling upgrade or a fresh node-pool scale-out, this silently bricks your entire new node capacity.


The Attack Vector / Blast Radius

This is a scheduling dead-zone failure. The blast radius scales with your node count:

  • Node-level: calico-node DaemonSet pod crashes → node CNI is non-functional → kubelet reports NetworkPlugin not ready → node taints itself node.kubernetes.io/not-ready.
  • Workload-level: All pods scheduled to the affected node are blocked. Existing pods are NOT evicted immediately, but any restart (OOMKill, rolling deploy, HPA scale-up) on that node will fail to re-attach networking.
  • Control-plane ripple: If the node running coredns or a critical system pod is affected, DNS resolution cluster-wide degrades.
  • Dual-stack-specific trap: You enabled dual-stack (--cluster-cidr=10.244.0.0/16,fd00::/48) in kube-controller-manager and kubelet, but Calico's IPAM is independent of Kubernetes IPAM. Calico requires its own IPPool CRD for each address family. Kubernetes knowing about your IPv6 CIDR means nothing to calicoctl.

The most common triggers in production:

  1. Dual-stack enabled on the cluster after initial Calico install — the IPv6 IPPool was never created.
  2. blockSize set to /128 (single address per block) causing immediate exhaustion.
  3. ipipMode: Always set on the IPv6 pool — IPIP is IPv4-only; Calico silently marks the pool invalid.
  4. disabled: true left in the IPPool manifest from a staging copy-paste.

How to Fix It

Step 1 — Confirm the diagnosis

# Check what pools Calico actually knows about
calicoctl get ippools -o wide

# If you see only an IPv4 pool, that's your problem.
# Also check Felix config
calicoctl get felixconfiguration default -o yaml | grep -i ipv6

Basic Fix — Create the missing IPv6 IPPool

- # No IPv6 IPPool exists — output of 'calicoctl get ippools':
- # NAME                  CIDR            SELECTOR
- # default-ipv4-ippool   10.244.0.0/16   all()

+ # Apply this manifest: kubectl apply -f ipv6-ippool.yaml
+ apiVersion: projectcalico.org/v3
+ kind: IPPool
+ metadata:
+   name: default-ipv6-ippool
+ spec:
+   cidr: fd00::/112
+   blockSize: 122
+   ipipMode: Never
+   vxlanMode: Never
+   natOutgoing: false
+   disabled: false
+   nodeSelector: all()

⚠️ blockSize rule: For IPv6, blockSize must be between 116 and 128. A /122 gives 64 addresses per block — sane for most clusters. Never use /128 in production.

Enterprise Best Practice — Full dual-stack IPPool + FelixConfiguration patch

- apiVersion: projectcalico.org/v3
- kind: FelixConfiguration
- metadata:
-   name: default
- spec:
-   logSeverityScreen: Info
-   # ipv6Support field absent — defaults to false in Calico < 3.23

+ apiVersion: projectcalico.org/v3
+ kind: FelixConfiguration
+ metadata:
+   name: default
+ spec:
+   logSeverityScreen: Info
+   ipv6Support: true
+   bpfEnabled: false          # set true only if you are on eBPF dataplane
+   routeTableRange:
+     min: 1
+     max: 250
- apiVersion: projectcalico.org/v3
- kind: IPPool
- metadata:
-   name: default-ipv6-ippool
- spec:
-   cidr: fd00::/48             # too large, blockSize will be miscalculated
-   blockSize: 128              # single IP per block — exhausts immediately
-   ipipMode: Always            # FATAL: IPIP is IPv4-only
-   disabled: true              # copy-paste artifact from staging

+ apiVersion: projectcalico.org/v3
+ kind: IPPool
+ metadata:
+   name: default-ipv6-ippool
+ spec:
+   cidr: fd00::/112            # sized to match --cluster-cidr IPv6 range
+   blockSize: 122
+   ipipMode: Never             # mandatory for IPv6
+   vxlanMode: Never            # or 'CrossSubnet' if you need overlay on IPv6
+   natOutgoing: false          # IPv6 uses SLAAC/NDP, not NAT
+   disabled: false
+   nodeSelector: all()

Step 3 — Force calico-node to re-evaluate after fix

# Restart the DaemonSet — do NOT delete individual pods manually in production
kubectl rollout restart daemonset/calico-node -n kube-system

# Watch recovery
kubectl rollout status daemonset/calico-node -n kube-system

# Verify IPAM health
calicoctl ipam check
calicoctl ipam show --show-blocks

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. OPA/Gatekeeper policy — block IPPool with ipipMode on IPv6

package calico.ippool

deny[msg] {
  input.kind == "IPPool"
  contains(input.spec.cidr, ":")
  input.spec.ipipMode != "Never"
  msg := sprintf("IPPool %v: ipipMode must be 'Never' for IPv6 CIDRs", [input.metadata.name])
}

deny[msg] {
  input.kind == "IPPool"
  contains(input.spec.cidr, ":")
  input.spec.disabled == true
  msg := sprintf("IPPool %v: IPv6 pool must not be disabled in production", [input.metadata.name])
}

2. Checkov custom check (Python)

# checkov/custom/check_calico_ipv6_pool.py
from checkov.common.models.enums import CheckResult, CheckCategories
from checkov.kubernetes.checks.resource.base_resource_check import BaseK8Check

class CalicoIPv6PoolCheck(BaseK8Check):
    def __init__(self):
        name = "Ensure Calico IPv6 IPPool has ipipMode=Never and is not disabled"
        id = "CKV_K8S_CALICO_001"
        super().__init__(name=name, id=id,
                         categories=[CheckCategories.NETWORKING],
                         supported_entities=["IPPool"])

    def scan_resource_conf(self, conf):
        cidr = conf.get("spec", {}).get("cidr", "")
        if ":" not in cidr:
            return CheckResult.PASSED
        spec = conf.get("spec", {})
        if spec.get("ipipMode", "Never") != "Never":
            return CheckResult.FAILED
        if spec.get("disabled", False):
            return CheckResult.FAILED
        return CheckResult.PASSED

3. Pre-flight in your Helm/Terraform pipeline

# Add to your cluster bootstrap script BEFORE calico-node DaemonSet is applied
IPV6_POOL=$(calicoctl get ippools -o json | jq '[.items[] | select(.spec.cidr | contains(":"))] | length')
if [ "$IPV6_POOL" -eq 0 ]; then
  echo "FATAL: No IPv6 IPPool found. Apply ipv6-ippool.yaml before proceeding."
  exit 1
fi

This pre-flight gate catches the missing pool before calico-node is rolled out, eliminating the CrashLoop entirely in automated provisioning pipelines.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →