Initializing Enclave...

How to Fix Kubernetes LoadBalancer Service Stuck in Pending External IP (Complete Debug Guide)

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

  • What broke: Kubernetes provisioned a LoadBalancer Service but the cloud controller never assigned an external IP — your app is unreachable from outside the cluster.
  • How to fix it: Identify whether the failure is a missing cloud-controller-manager, wrong subnet/annotation, exhausted EIP quota, or a bare-metal cluster with no L4 LB provider (MetalLB/Cilium LB IPAM), then apply the targeted fix below.
  • Shortcut: Use our Client-Side Sandbox above to paste your Service YAML and get auto-refactored output with the correct annotations for your cloud provider.

The Incident (What Does the Error Mean?)

$ kubectl get svc my-app-svc -n production
NAME          TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
my-app-svc    LoadBalancer   10.100.45.201   <pending>     443:31443/TCP  18m
$ kubectl describe svc my-app-svc -n production
...
Events:
  Warning  SyncLoadBalancerFailed  5m   service-controller  Error syncing load balancer: failed to ensure load balancer: could not find any suitable subnets for creating the ELB

Immediate consequence: No external IP = no ingress. Every user hitting your domain gets a timeout. If this is behind a DNS record that just propagated, you have a full production outage. The Service object exists; the cloud-side NLB/ALB/ELB resource does not.


The Attack Vector / Blast Radius

This is not a security exploit — it is a silent infrastructure failure with a wide blast radius:

  • Zero external connectivity. All pods behind this Service are healthy and running. The failure is purely at the L4 cloud provisioning layer. Kubernetes reports nothing wrong with your workload.
  • DNS TTL trap. If your DNS already points to an IP that was previously assigned (e.g., after a Service delete/recreate), old clients may still resolve. New deployments or fresh DNS lookups get nothing.
  • Cascading dependency failure. Any downstream service, health check, or CDN origin that polls this endpoint will start failing. In a microservices mesh, this can trigger circuit breakers cluster-wide.
  • Root causes by environment:
Environment Most Common Root Cause
AWS EKS Missing kubernetes.io/role/elb subnet tag or exhausted EIP quota
GKE Firewall rule not created; wrong network annotation
AKS Service principal lacks Network Contributor on the VNet
kubeadm / bare-metal No cloud-controller-manager installed at all
Kind / Minikube local LoadBalancer type unsupported without MetalLB or minikube tunnel

How to Fix It (The Solution)

Step 1 — Confirm the CCM is running

kubectl get pods -n kube-system | grep cloud-controller
kubectl logs -n kube-system -l component=cloud-controller-manager --tail=50

If no pod exists on a self-managed cluster, you have no cloud-controller-manager — the cluster cannot talk to the cloud API to provision LBs. Install the provider-specific CCM or switch to MetalLB.


Basic Fix — AWS EKS: Tag Your Subnets

The most common EKS failure. The AWS Load Balancer Controller (or legacy in-tree controller) requires explicit subnet tags.

# AWS Console / Terraform — Subnet Tags
- # No tags on private subnets
+ kubernetes.io/cluster/<YOUR_CLUSTER_NAME> = shared
+ kubernetes.io/role/internal-elb = 1   # for internal NLB
# OR
+ kubernetes.io/role/elb = 1            # for internet-facing NLB/ALB
# Service YAML
 apiVersion: v1
 kind: Service
 metadata:
   name: my-app-svc
   namespace: production
   annotations:
-    # missing annotations
+    service.beta.kubernetes.io/aws-load-balancer-type: "external"
+    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
+    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
 spec:
   type: LoadBalancer
   selector:
     app: my-app
   ports:
     - port: 443
       targetPort: 8443

Basic Fix — Bare-Metal / Kind / kubeadm: Install MetalLB

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.5/config/manifests/metallb-native.yaml
+apiVersion: metallb.io/v1beta1
+kind: IPAddressPool
+metadata:
+  name: production-pool
+  namespace: metallb-system
+spec:
+  addresses:
+  - 192.168.1.200-192.168.1.250
+---
+apiVersion: metallb.io/v1beta1
+kind: L2Advertisement
+metadata:
+  name: l2-advert
+  namespace: metallb-system

Enterprise Best Practice — AWS: Use AWS Load Balancer Controller (Not In-Tree)

The in-tree cloud provider for AWS is deprecated. Use the out-of-tree AWS Load Balancer Controller with IRSA.

 # Helm values for aws-load-balancer-controller
 clusterName: production-eks-cluster
- # No IAM role configured — controller cannot call EC2/ELB APIs
+ serviceAccount:
+   annotations:
+     eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/AWSLoadBalancerControllerRole
+ region: us-east-1
+ vpcId: vpc-0abc123def456
 # Service YAML — enforce NLB with specific subnets
 metadata:
   annotations:
+    service.beta.kubernetes.io/aws-load-balancer-type: "external"
+    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
+    service.beta.kubernetes.io/aws-load-balancer-subnets: "subnet-0aaa,subnet-0bbb"
+    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"

Verify provisioning after applying:

kubectl get svc my-app-svc -n production -w
# Should transition from <pending> to an actual IP/hostname within 60–90s

kubectl describe svc my-app-svc -n production | grep -A 20 Events

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. OPA/Gatekeeper — Enforce Required Annotations on LoadBalancer Services

package kubernetes.admission

deny[msg] {
  input.request.kind.kind == "Service"
  input.request.object.spec.type == "LoadBalancer"
  not input.request.object.metadata.annotations["service.beta.kubernetes.io/aws-load-balancer-type"]
  msg := "LoadBalancer Services must declare aws-load-balancer-type annotation"
}

2. Checkov — Scan Service YAMLs Pre-Merge

checkov -d ./k8s/manifests --framework kubernetes --check CKV_K8S_*

Add a custom check to flag type: LoadBalancer without required cloud annotations.

3. Terraform — Tag Subnets at Provision Time (Never Manually)

 resource "aws_subnet" "private" {
   ...
+  tags = {
+    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
+    "kubernetes.io/role/internal-elb"           = "1"
+  }
 }

4. Readiness Gate in CD Pipeline

# Block deployment promotion until EXTERNAL-IP is assigned
kubectl wait --for=jsonpath='{.status.loadBalancer.ingress}' \
  service/my-app-svc -n production --timeout=120s || exit 1

If this times out in your pipeline, the deployment is rolled back automatically — you never silently ship a broken ingress to production.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →