Fixing Cilium 'BPF Map Creation Failed' Error: Kernel Version Compatibility Guide
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins
TL;DR
- What broke: Cilium's agent cannot create eBPF maps (
LRU_HASH,SOCKMAP, orPERCPU_ARRAY) because the host kernel predates the syscall support for those map types — pod networking is completely down on affected nodes. - How to fix it: Upgrade the node kernel to ≥ 5.10 LTS (4.19 absolute minimum for basic Cilium, 5.10+ for full feature parity), or disable the specific Cilium features requiring newer map types via Helm values.
- Shortcut: Use our Client-Side Sandbox below to auto-refactor your Cilium Helm
values.yamlor DaemonSet spec against your kernel version — no config leaves your browser.
The Incident (What does the error mean?)
Raw error from kubectl logs -n kube-system cilium-<pod> -c cilium-agent:
level=fatal msg="Failed to create BPF map" error="map create: operation not supported" mapName="cilium_lb4_services_v2" type=LRU_HASH
level=fatal msg="BPF map creation failed, cannot continue" subsys=maps
Or the kernel-level variant:
FATAL: kernel too old to run Cilium (4.14.0 < 4.19.57)
Immediate consequence: The cilium-agent DaemonSet pod on that node enters CrashLoopBackOff. The CNI binary is present but non-functional. Every pod scheduled to that node gets stuck in ContainerCreating with networkPlugin cni failed to set up pod network. The node is effectively a networking black hole.
The Attack Vector / Blast Radius
This is not a gradual degradation — it is a hard split-brain failure:
- Node-level CNI death: All pods on the affected node lose network setup. Existing pods with established network namespaces may retain connectivity temporarily, but any restart kills them permanently.
- Cluster autoscaler amplification: If your node pool is autoscaling and the base AMI/image has an old kernel, every new node the autoscaler provisions will immediately become a dead zone. You scale into the outage.
- Silent kube-proxy bypass failure: Cilium in kube-proxy replacement mode owns
iptablesbypass via eBPF. Ifcilium-agentnever initializes,kube-proxyis also absent (it was disabled). Service VIP resolution fails completely — not just for new pods, but for the entire node's routing table. - Hubble observability blackout: No agent means no flow logs, no policy audit trail. You are blind during the incident.
- Cascading pod eviction: The node condition
NetworkUnavailable=Trueis set. The scheduler stops placing pods. If this hits multiple nodes simultaneously (e.g., a rolling node pool upgrade to a bad AMI), you lose quorum on stateful workloads.
The kernel version matrix that kills you:
| Cilium Feature | Minimum Kernel |
|---|---|
| Basic pod networking | 4.19.57 |
LRU_HASH maps |
4.10 |
SOCKMAP / socket-level LB |
5.7 |
| Host-reachable services | 5.10 |
| BPF NodePort (full) | 5.10 |
| Wireguard encryption | 5.6 |
| Bandwidth Manager (EDT) | 5.1 |
How to Fix It
Step 0: Confirm the kernel version on the failing node
# Get kernel version per node
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.nodeInfo.kernelVersion}{"\n"}{end}'
# Or exec into the cilium pod before it crashes
kubectl exec -n kube-system cilium-<pod> -- uname -r
Basic Fix — Disable Features Requiring Newer Kernels
If you cannot immediately upgrade the kernel, surgically disable the offending features in your Cilium Helm values:
# cilium/values.yaml
- kubeProxyReplacement: strict
+ kubeProxyReplacement: disabled
- hostServices:
- enabled: true
+ hostServices:
+ enabled: false
- nodePort:
- enabled: true
+ nodePort:
+ enabled: false
- bandwidthManager:
- enabled: true
+ bandwidthManager:
+ enabled: false
- hubble:
- enabled: true
+ hubble:
+ enabled: false
Then roll the DaemonSet:
helm upgrade cilium cilium/cilium \
--namespace kube-system \
--reuse-values \
-f values-degraded.yaml
kubectl rollout restart daemonset/cilium -n kube-system
Enterprise Best Practice — Node Affinity + Kernel Enforcement
Do not let Cilium schedule on nodes that cannot support it. Enforce at the DaemonSet level and at the node pool level.
1. Label compliant nodes during bootstrap (cloud-init / userdata):
kubectl label node <node-name> cilium.io/kernel-compatible=true
2. Patch the Cilium DaemonSet with node affinity:
# cilium DaemonSet spec.template.spec
+ affinity:
+ nodeAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ nodeSelectorTerms:
+ - matchExpressions:
+ - key: cilium.io/kernel-compatible
+ operator: In
+ values:
+ - "true"
3. Enforce minimum kernel in Helm values (Cilium 1.13+):
# values.yaml
+ k8s:
+ requireIPv4PodCIDR: true
+ affinity:
+ nodeAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ nodeSelectorTerms:
+ - matchExpressions:
+ - key: kubernetes.io/os
+ operator: In
+ values:
+ - linux
+ - key: cilium.io/kernel-compatible
+ operator: In
+ values:
+ - "true"
4. For EKS/GKE/AKS — pin the node pool image to a kernel-guaranteed AMI:
# Terraform EKS node group example
resource "aws_eks_node_group" "workers" {
...
- ami_type = "AL2_x86_64"
+ ami_type = "AL2_x86_64" # Ensure AMI >= al2-kernel-5.10
+ # Pin to a specific AMI ID validated at ≥5.10:
+ launch_template {
+ id = aws_launch_template.cilium_compatible.id
+ version = "$Latest"
+ }
}
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Pre-flight kernel check in node bootstrap script
#!/bin/bash
# Add to EC2 userdata / cloud-init before kubelet starts
MIN_KERNEL="5.10"
CURRENT=$(uname -r | cut -d'-' -f1)
if ! printf '%s\n%s' "$MIN_KERNEL" "$CURRENT" | sort -V -C; then
echo "FATAL: Kernel $CURRENT < required $MIN_KERNEL for Cilium. Halting."
systemctl stop kubelet
exit 1
fi
2. OPA/Gatekeeper policy — block Cilium DaemonSet on unlabeled nodes
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredNodeAffinity
metadata:
name: cilium-kernel-affinity
spec:
match:
kinds:
- apiGroups: ["apps"]
kinds: ["DaemonSet"]
namespaces: ["kube-system"]
parameters:
requiredAffinityKey: "cilium.io/kernel-compatible"
requiredAffinityValue: "true"
3. Checkov / Trivy in your Helm pipeline
# .github/workflows/cilium-lint.yaml
- name: Render Cilium Helm chart
run: helm template cilium cilium/cilium -f values.yaml > rendered.yaml
- name: Run Checkov on rendered manifest
uses: bridgecrewio/checkov-action@master
with:
file: rendered.yaml
framework: kubernetes
- name: Validate kernel label affinity exists
run: |
grep -q 'cilium.io/kernel-compatible' rendered.yaml || \
(echo 'Missing kernel affinity in Cilium DaemonSet' && exit 1)
4. Cilium CLI pre-flight (run before every cluster upgrade)
cilium install --dry-run 2>&1 | grep -i "kernel"
cilium preflight install
cilium status --wait
The cilium preflight command explicitly validates kernel version compatibility before committing any changes. Run it in your upgrade pipeline as a blocking gate.