How to Fix Kubernetes LoadBalancer Service Stuck in Pending External IP (Complete Debug Guide)
Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins
TL;DR
- What broke: Kubernetes provisioned a
LoadBalancerService but the cloud controller never assigned an external IP — your app is unreachable from outside the cluster. - How to fix it: Identify whether the failure is a missing cloud-controller-manager, wrong subnet/annotation, exhausted EIP quota, or a bare-metal cluster with no L4 LB provider (MetalLB/Cilium LB IPAM), then apply the targeted fix below.
- Shortcut: Use our Client-Side Sandbox above to paste your Service YAML and get auto-refactored output with the correct annotations for your cloud provider.
The Incident (What Does the Error Mean?)
$ kubectl get svc my-app-svc -n production
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-app-svc LoadBalancer 10.100.45.201 <pending> 443:31443/TCP 18m
$ kubectl describe svc my-app-svc -n production
...
Events:
Warning SyncLoadBalancerFailed 5m service-controller Error syncing load balancer: failed to ensure load balancer: could not find any suitable subnets for creating the ELB
Immediate consequence: No external IP = no ingress. Every user hitting your domain gets a timeout. If this is behind a DNS record that just propagated, you have a full production outage. The Service object exists; the cloud-side NLB/ALB/ELB resource does not.
The Attack Vector / Blast Radius
This is not a security exploit — it is a silent infrastructure failure with a wide blast radius:
- Zero external connectivity. All pods behind this Service are healthy and running. The failure is purely at the L4 cloud provisioning layer. Kubernetes reports nothing wrong with your workload.
- DNS TTL trap. If your DNS already points to an IP that was previously assigned (e.g., after a Service delete/recreate), old clients may still resolve. New deployments or fresh DNS lookups get nothing.
- Cascading dependency failure. Any downstream service, health check, or CDN origin that polls this endpoint will start failing. In a microservices mesh, this can trigger circuit breakers cluster-wide.
- Root causes by environment:
| Environment | Most Common Root Cause |
|---|---|
| AWS EKS | Missing kubernetes.io/role/elb subnet tag or exhausted EIP quota |
| GKE | Firewall rule not created; wrong network annotation |
| AKS | Service principal lacks Network Contributor on the VNet |
| kubeadm / bare-metal | No cloud-controller-manager installed at all |
| Kind / Minikube local | LoadBalancer type unsupported without MetalLB or minikube tunnel |
How to Fix It (The Solution)
Step 1 — Confirm the CCM is running
kubectl get pods -n kube-system | grep cloud-controller
kubectl logs -n kube-system -l component=cloud-controller-manager --tail=50
If no pod exists on a self-managed cluster, you have no cloud-controller-manager — the cluster cannot talk to the cloud API to provision LBs. Install the provider-specific CCM or switch to MetalLB.
Basic Fix — AWS EKS: Tag Your Subnets
The most common EKS failure. The AWS Load Balancer Controller (or legacy in-tree controller) requires explicit subnet tags.
# AWS Console / Terraform — Subnet Tags
- # No tags on private subnets
+ kubernetes.io/cluster/<YOUR_CLUSTER_NAME> = shared
+ kubernetes.io/role/internal-elb = 1 # for internal NLB
# OR
+ kubernetes.io/role/elb = 1 # for internet-facing NLB/ALB
# Service YAML
apiVersion: v1
kind: Service
metadata:
name: my-app-svc
namespace: production
annotations:
- # missing annotations
+ service.beta.kubernetes.io/aws-load-balancer-type: "external"
+ service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
+ service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
type: LoadBalancer
selector:
app: my-app
ports:
- port: 443
targetPort: 8443
Basic Fix — Bare-Metal / Kind / kubeadm: Install MetalLB
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.5/config/manifests/metallb-native.yaml
+apiVersion: metallb.io/v1beta1
+kind: IPAddressPool
+metadata:
+ name: production-pool
+ namespace: metallb-system
+spec:
+ addresses:
+ - 192.168.1.200-192.168.1.250
+---
+apiVersion: metallb.io/v1beta1
+kind: L2Advertisement
+metadata:
+ name: l2-advert
+ namespace: metallb-system
Enterprise Best Practice — AWS: Use AWS Load Balancer Controller (Not In-Tree)
The in-tree cloud provider for AWS is deprecated. Use the out-of-tree AWS Load Balancer Controller with IRSA.
# Helm values for aws-load-balancer-controller
clusterName: production-eks-cluster
- # No IAM role configured — controller cannot call EC2/ELB APIs
+ serviceAccount:
+ annotations:
+ eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/AWSLoadBalancerControllerRole
+ region: us-east-1
+ vpcId: vpc-0abc123def456
# Service YAML — enforce NLB with specific subnets
metadata:
annotations:
+ service.beta.kubernetes.io/aws-load-balancer-type: "external"
+ service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
+ service.beta.kubernetes.io/aws-load-balancer-subnets: "subnet-0aaa,subnet-0bbb"
+ service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
Verify provisioning after applying:
kubectl get svc my-app-svc -n production -w
# Should transition from <pending> to an actual IP/hostname within 60–90s
kubectl describe svc my-app-svc -n production | grep -A 20 Events
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. OPA/Gatekeeper — Enforce Required Annotations on LoadBalancer Services
package kubernetes.admission
deny[msg] {
input.request.kind.kind == "Service"
input.request.object.spec.type == "LoadBalancer"
not input.request.object.metadata.annotations["service.beta.kubernetes.io/aws-load-balancer-type"]
msg := "LoadBalancer Services must declare aws-load-balancer-type annotation"
}
2. Checkov — Scan Service YAMLs Pre-Merge
checkov -d ./k8s/manifests --framework kubernetes --check CKV_K8S_*
Add a custom check to flag type: LoadBalancer without required cloud annotations.
3. Terraform — Tag Subnets at Provision Time (Never Manually)
resource "aws_subnet" "private" {
...
+ tags = {
+ "kubernetes.io/cluster/${var.cluster_name}" = "shared"
+ "kubernetes.io/role/internal-elb" = "1"
+ }
}
4. Readiness Gate in CD Pipeline
# Block deployment promotion until EXTERNAL-IP is assigned
kubectl wait --for=jsonpath='{.status.loadBalancer.ingress}' \
service/my-app-svc -n production --timeout=120s || exit 1
If this times out in your pipeline, the deployment is rolled back automatically — you never silently ship a broken ingress to production.