Fixing Multus 'Failed to Create Network Attachment' for Secondary Interfaces in Kubernetes
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–30 mins
TL;DR
- What broke: Multus CNI cannot attach a secondary interface to your pod. The pod enters
ContainerCreatingindefinitely or crashes. Network-dependent workloads (SR-IOV, DPDK, storage replication) are dead. - How to fix it: Validate the
NetworkAttachmentDefinitionIPAM config, confirm themasterNIC exists on the node, verify the Multus pod annotation syntax, and check RBAC permissions for thenet-attach-defresource. - Shortcut: Use our Client-Side Sandbox above to auto-refactor your failing NAD YAML or pod annotation without leaking your cluster config.
The Incident (What Does the Error Mean?)
You'll see this in kubectl describe pod <pod-name>:
Warning FailedCreatePodSandBox kubelet Failed to create pod sandbox:
rpc error: code = Unknown desc = failed to create pod network sandbox:
[failed to create network attachment for pod "default/my-app-7d4f9b-xkp2q"
in namespace "default": error adding container to network "sriov-net":
failed to create network attachment: no IP addresses available in range set:
192.168.1.0/24
Or alternatively:
failed to find network attachment definition "default/macvlan-conf"
failed to delegate add: failed to find plugin "macvlan" in path [/opt/cni/bin]
cannot find IPAM plugin "whereabouts" in /opt/cni/bin
Immediate consequence: The pod never reaches Running. Every restart loop burns node resources. If this is a DaemonSet, every node in the cluster may be affected simultaneously.
The Attack Vector / Blast Radius
This is not a soft degradation. This is a hard pod scheduling failure.
Cascade path:
- Pod stuck in
ContainerCreating→ liveness/readiness never fires → Service endpoints never populate → upstream load balancer marks backend unhealthy. - If the failing pod is a storage or CNF workload (Ceph OSD, vRAN DU, Whereabouts IPAM itself), the blast radius expands to data loss or full network plane outage.
- In SR-IOV environments, a bad
resourceNamein the NAD leaves Virtual Functions allocated but unbound — leaking VFs until node reboot. - RBAC misconfiguration on
net-attach-defmeans the Multus daemonset cannot read NADs in foreign namespaces — silently breaking cross-namespace network policies with no obvious error surface.
The non-obvious danger: Multus failures are often silent at the cluster level. Kubernetes marks the pod as ContainerCreating, not Error. Alerts tuned for CrashLoopBackOff miss this entirely. You find out from an application team, not your monitoring stack.
How to Fix It (The Solution)
Step 1 — Confirm the NAD exists in the correct namespace
kubectl get network-attachment-definitions -n <pod-namespace>
The NAD must be in the same namespace as the pod unless you've configured cluster-scoped NADs. This is the #1 mistake.
Step 2 — Validate the annotation syntax on the pod spec
# Pod metadata annotations
- annotations:
- k8s.v1.cni.cncf.io/networks: macvlan-conf # WRONG: bare string, no namespace
+ annotations:
+ k8s.v1.cni.cncf.io/networks: default/macvlan-conf # CORRECT: namespace/name
Step 3 — Fix the NetworkAttachmentDefinition IPAM block
Most no IP addresses available errors are a misconfigured or missing IPAM range, or Whereabouts not installed.
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: macvlan-conf
namespace: default
spec:
config: '{
"cniVersion": "0.3.1",
"type": "macvlan",
- "master": "eth0",
+ "master": "ens3f0",
"mode": "bridge",
"ipam": {
- "type": "host-local",
- "subnet": "192.168.1.0/24"
+ "type": "whereabouts",
+ "range": "192.168.100.0/24",
+ "exclude": [
+ "192.168.100.0/32",
+ "192.168.100.255/32"
+ ]
}
}'
Why whereabouts over host-local: host-local is node-local only — it cannot deduplicate IPs across nodes. In any multi-node scenario, you will get IP collisions. Whereabouts uses etcd/Kubernetes API as a distributed lock.
Step 4 — Confirm the CNI plugin binary exists on the node
# SSH to the affected node
ls /opt/cni/bin/ | grep -E 'macvlan|sriov|whereabouts'
If the binary is missing, the fix is to redeploy the correct CNI plugin DaemonSet — not to patch the NAD.
Step 5 — Fix RBAC for Multus to read NADs
# ClusterRole for multus
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: multus
rules:
- apiGroups: ["k8s.cni.cncf.io"]
resources: ["network-attachment-definitions"]
- verbs: ["get", "list", "watch"]
+ verbs: ["get", "list", "watch", "create", "update"] # only if using auto-provisioning
For read-only Multus deployments, get/list/watch is sufficient and correct. If your Multus pods are running as a non-default service account, verify the ClusterRoleBinding maps to the actual service account in the kube-system namespace.
Enterprise Best Practice
- Pin NAD configs to a dedicated namespace (
network-infra) and useNetworkPolicyto restrict who can create/modify NADs. Treat NAD write access as equivalent to node network access. - Use Whereabouts with a dedicated etcd backend in high-churn environments (>500 pod/min). The default Kubernetes API store for Whereabouts IP allocations becomes a bottleneck under heavy pod scheduling.
- Label nodes with available master interfaces and use
nodeSelectorornodeAffinityon pods requiring specific secondary NICs. Scheduling a pod requiringens3f0onto a node that only hasens4f0is a guaranteed attach failure.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. OPA/Gatekeeper policy — enforce NAD namespace match:
package multus.nadnamespace
violation[{"msg": msg}] {
input.review.object.kind == "Pod"
annotation := input.review.object.metadata.annotations["k8s.v1.cni.cncf.io/networks"]
not contains(annotation, "/")
msg := sprintf("Multus network annotation must be namespace-qualified: got '%v'", [annotation])
}
2. Checkov custom check — validate NAD has IPAM type set:
checkov -f nad.yaml --check CKV_K8S_MULTUS_IPAM
Write a custom Checkov check that parses the embedded JSON in spec.config and fails if ipam.type is absent or set to host-local in a multi-node cluster context.
3. Pre-deployment node readiness gate:
Add a Job to your Helm chart or ArgoCD sync wave that runs before your workload:
#!/bin/bash
# Verify master interface and CNI binary exist before workload deploy
for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
kubectl debug node/$node -it --image=busybox -- \
sh -c "ls /host/opt/cni/bin/whereabouts && ip link show ens3f0" 2>&1
done
4. Whereabouts IP exhaustion alert:
# Prometheus alerting rule
- alert: WhereaboutsIPPoolExhausted
expr: |
(whereabouts_ip_allocations_total / whereabouts_ip_pool_size) > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "Whereabouts IP pool >85% utilized — pod attach failures imminent"
Catch exhaustion before it becomes an outage. At 100% pool utilization, every new pod in that range fails with the exact error at the top of this guide.