Initializing Enclave...

Fixing Multus 'Failed to Create Network Attachment' for Secondary Interfaces in Kubernetes

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–30 mins


TL;DR

  • What broke: Multus CNI cannot attach a secondary interface to your pod. The pod enters ContainerCreating indefinitely or crashes. Network-dependent workloads (SR-IOV, DPDK, storage replication) are dead.
  • How to fix it: Validate the NetworkAttachmentDefinition IPAM config, confirm the master NIC exists on the node, verify the Multus pod annotation syntax, and check RBAC permissions for the net-attach-def resource.
  • Shortcut: Use our Client-Side Sandbox above to auto-refactor your failing NAD YAML or pod annotation without leaking your cluster config.

The Incident (What Does the Error Mean?)

You'll see this in kubectl describe pod <pod-name>:

Warning  FailedCreatePodSandBox  kubelet  Failed to create pod sandbox:
  rpc error: code = Unknown desc = failed to create pod network sandbox:
  [failed to create network attachment for pod "default/my-app-7d4f9b-xkp2q"
  in namespace "default": error adding container to network "sriov-net":
  failed to create network attachment: no IP addresses available in range set:
  192.168.1.0/24

Or alternatively:

failed to find network attachment definition "default/macvlan-conf"
failed to delegate add: failed to find plugin "macvlan" in path [/opt/cni/bin]
cannot find IPAM plugin "whereabouts" in /opt/cni/bin

Immediate consequence: The pod never reaches Running. Every restart loop burns node resources. If this is a DaemonSet, every node in the cluster may be affected simultaneously.


The Attack Vector / Blast Radius

This is not a soft degradation. This is a hard pod scheduling failure.

Cascade path:

  1. Pod stuck in ContainerCreating → liveness/readiness never fires → Service endpoints never populate → upstream load balancer marks backend unhealthy.
  2. If the failing pod is a storage or CNF workload (Ceph OSD, vRAN DU, Whereabouts IPAM itself), the blast radius expands to data loss or full network plane outage.
  3. In SR-IOV environments, a bad resourceName in the NAD leaves Virtual Functions allocated but unbound — leaking VFs until node reboot.
  4. RBAC misconfiguration on net-attach-def means the Multus daemonset cannot read NADs in foreign namespaces — silently breaking cross-namespace network policies with no obvious error surface.

The non-obvious danger: Multus failures are often silent at the cluster level. Kubernetes marks the pod as ContainerCreating, not Error. Alerts tuned for CrashLoopBackOff miss this entirely. You find out from an application team, not your monitoring stack.


How to Fix It (The Solution)

Step 1 — Confirm the NAD exists in the correct namespace

kubectl get network-attachment-definitions -n <pod-namespace>

The NAD must be in the same namespace as the pod unless you've configured cluster-scoped NADs. This is the #1 mistake.

Step 2 — Validate the annotation syntax on the pod spec

# Pod metadata annotations
- annotations:
-   k8s.v1.cni.cncf.io/networks: macvlan-conf   # WRONG: bare string, no namespace
+ annotations:
+   k8s.v1.cni.cncf.io/networks: default/macvlan-conf  # CORRECT: namespace/name

Step 3 — Fix the NetworkAttachmentDefinition IPAM block

Most no IP addresses available errors are a misconfigured or missing IPAM range, or Whereabouts not installed.

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf
  namespace: default
spec:
  config: '{
    "cniVersion": "0.3.1",
    "type": "macvlan",
-   "master": "eth0",
+   "master": "ens3f0",
    "mode": "bridge",
    "ipam": {
-     "type": "host-local",
-     "subnet": "192.168.1.0/24"
+     "type": "whereabouts",
+     "range": "192.168.100.0/24",
+     "exclude": [
+       "192.168.100.0/32",
+       "192.168.100.255/32"
+     ]
    }
  }'

Why whereabouts over host-local: host-local is node-local only — it cannot deduplicate IPs across nodes. In any multi-node scenario, you will get IP collisions. Whereabouts uses etcd/Kubernetes API as a distributed lock.

Step 4 — Confirm the CNI plugin binary exists on the node

# SSH to the affected node
ls /opt/cni/bin/ | grep -E 'macvlan|sriov|whereabouts'

If the binary is missing, the fix is to redeploy the correct CNI plugin DaemonSet — not to patch the NAD.

Step 5 — Fix RBAC for Multus to read NADs

# ClusterRole for multus
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: multus
rules:
- apiGroups: ["k8s.cni.cncf.io"]
  resources: ["network-attachment-definitions"]
- verbs: ["get", "list", "watch"]
+ verbs: ["get", "list", "watch", "create", "update"]  # only if using auto-provisioning

For read-only Multus deployments, get/list/watch is sufficient and correct. If your Multus pods are running as a non-default service account, verify the ClusterRoleBinding maps to the actual service account in the kube-system namespace.

Enterprise Best Practice

  • Pin NAD configs to a dedicated namespace (network-infra) and use NetworkPolicy to restrict who can create/modify NADs. Treat NAD write access as equivalent to node network access.
  • Use Whereabouts with a dedicated etcd backend in high-churn environments (>500 pod/min). The default Kubernetes API store for Whereabouts IP allocations becomes a bottleneck under heavy pod scheduling.
  • Label nodes with available master interfaces and use nodeSelector or nodeAffinity on pods requiring specific secondary NICs. Scheduling a pod requiring ens3f0 onto a node that only has ens4f0 is a guaranteed attach failure.

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. OPA/Gatekeeper policy — enforce NAD namespace match:

package multus.nadnamespace

violation[{"msg": msg}] {
  input.review.object.kind == "Pod"
  annotation := input.review.object.metadata.annotations["k8s.v1.cni.cncf.io/networks"]
  not contains(annotation, "/")
  msg := sprintf("Multus network annotation must be namespace-qualified: got '%v'", [annotation])
}

2. Checkov custom check — validate NAD has IPAM type set:

checkov -f nad.yaml --check CKV_K8S_MULTUS_IPAM

Write a custom Checkov check that parses the embedded JSON in spec.config and fails if ipam.type is absent or set to host-local in a multi-node cluster context.

3. Pre-deployment node readiness gate:

Add a Job to your Helm chart or ArgoCD sync wave that runs before your workload:

#!/bin/bash
# Verify master interface and CNI binary exist before workload deploy
for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
  kubectl debug node/$node -it --image=busybox -- \
    sh -c "ls /host/opt/cni/bin/whereabouts && ip link show ens3f0" 2>&1
done

4. Whereabouts IP exhaustion alert:

# Prometheus alerting rule
- alert: WhereaboutsIPPoolExhausted
  expr: |
    (whereabouts_ip_allocations_total / whereabouts_ip_pool_size) > 0.85
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Whereabouts IP pool >85% utilized — pod attach failures imminent"

Catch exhaustion before it becomes an outage. At 100% pool utilization, every new pod in that range fails with the exact error at the top of this guide.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →