Why does Karpenter show 'no instance types available' even when EC2 Spot capacity appears available in the AWS console?

The AWS console shows Spot pricing history and general availability, not real-time launchability for your specific account, VPC subnet, and security group combination. Karpenter's actual launch attempt goes through the EC2 Fleet API, which can reject requests due to subnet-level capacity constraints, AZ-specific exhaustion, or misconfigured EC2NodeClass selectors that silently filter out valid subnets. Check `kubectl describe nodeclaim ` for the precise rejection reason and verify your EC2NodeClass `subnetSelectorTerms` tags match actual subnet tags in your VPC.

How many instance types should I configure in a Karpenter NodePool to reliably avoid Spot capacity errors?

AWS recommends a minimum of 10–15 instance type variants across at least 3 families (e.g., m, c, r) and 3+ generations. Karpenter's `instance-category` and `instance-generation` requirement keys are far more resilient than listing explicit instance types because they automatically include new instance types as AWS releases them. The Spot best practice is 'be flexible, be diverse' — the more pools you're eligible for, the lower your interruption and unavailability probability.

Will adding on-demand as a fallback in my Karpenter NodePool significantly increase my AWS bill?

Only if Spot capacity is genuinely exhausted. Karpenter's `weight` field on NodePools controls preference ordering — a Spot NodePool with weight 100 and an on-demand fallback with weight 1 means on-demand nodes are only launched when the Spot pool cannot satisfy the request. In practice, with sufficient instance diversity, on-demand fallback triggers rarely. The cost of occasional on-demand fallback is always lower than the revenue impact of dropped traffic during a Spot capacity crunch.

Fixing Karpenter 'No Instance Types Available' Spot Capacity Errors: A Production Debugging Guide

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

What broke: Karpenter cannot find any EC2 Spot capacity matching your NodePool's instance type constraints in the requested Availability Zone(s), leaving pods in Pending indefinitely.
How to fix it: Broaden instance family diversity, add multi-AZ spread, and configure an on-demand fallback weight so Karpenter has an escape hatch when Spot pools dry up.
Shortcut: Use our Client-Side Sandbox above to paste your NodePool YAML and controller logs — it will auto-refactor the instance requirements and capacity fallback config without sending your data anywhere.

The Incident (What Does the Error Mean?)

You'll see this in kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter:

ERROR   controller.provisioner   no instance types available   {"commit": "abc1234", "provisioner": "default"}
ERROR   controller.nodeclaim     failed to launch nodeclaim    {"error": "no capacity available for node request"}

And pods sit here forever:

kubectl get pods -A | grep Pending
my-namespace   worker-7d9f6   0/1   Pending   0   18m

Immediate consequence: Karpenter's reconciler loop keeps retrying, burning controller CPU, while your workload is completely unscheduled. If this is a job queue or autoscaling event triggered by real traffic, you are dropping requests right now.

The Attack Vector / Blast Radius

Spot capacity is regional, AZ-specific, and instance-family-specific. AWS can reclaim or simply never offer a Spot pool for a given instance type in a given AZ at any moment. When your NodePool is too narrow — e.g., pinned to m5.xlarge only in us-east-1b — you've created a single point of failure against AWS's own capacity scheduler.

Cascading failure chain:

Spot pool for m5.xlarge in us-east-1b dries up (common during AZ-level demand spikes).
Karpenter finds zero valid instance types. Emits the error. Does nothing.
HPA has already scaled your Deployment replicas up. All new pods are Pending.
Your application's readiness probe starts failing on existing pods under increased load.
ALB target group health drops. 502s begin.
If you have a PodDisruptionBudget, Karpenter may also be blocked from consolidating existing nodes, compounding cost AND availability impact simultaneously.

The secondary blast: Engineers start manually launching On-Demand nodes or disabling Karpenter consolidation as a hotfix, leaving orphaned over-provisioned nodes that silently inflate your EC2 bill for weeks.

How to Fix It

Basic Fix — Broaden Instance Diversity

The minimum viable fix is to stop pinning to a single instance family and let Karpenter use karpenter.k8s.aws/instance-category with multiple families.

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
-       - key: node.kubernetes.io/instance-type
-         operator: In
-         values: ["m5.xlarge"]
-       - key: topology.kubernetes.io/zone
-         operator: In
-         values: ["us-east-1b"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
+       - key: karpenter.k8s.aws/instance-category
+         operator: In
+         values: ["m", "c", "r"]
+       - key: karpenter.k8s.aws/instance-generation
+         operator: Gt
+         values: ["4"]
+       - key: topology.kubernetes.io/zone
+         operator: In
+         values: ["us-east-1a", "us-east-1b", "us-east-1c"]
+       - key: karpenter.sh/capacity-type
+         operator: In
+         values: ["spot", "on-demand"]

Enterprise Best Practice — Weighted On-Demand Fallback + Disruption Budgets

For production, you need two NodePools: one Spot-optimized, one On-Demand fallback with a higher weight so Karpenter prefers Spot but can always fall back. Combine with disruption budgets to prevent thrashing during Spot reclamation events.

# NodePool 1: Spot (preferred)
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot-workers
spec:
+ weight: 100
  disruption:
+   consolidationPolicy: WhenUnderutilized
+   consolidateAfter: 30s
+   budgets:
+   - nodes: "20%"  # Never consolidate more than 20% of nodes at once
  template:
    spec:
      requirements:
+       - key: karpenter.sh/capacity-type
+         operator: In
+         values: ["spot"]
+       - key: karpenter.k8s.aws/instance-category
+         operator: In
+         values: ["m", "c", "r"]
+       - key: karpenter.k8s.aws/instance-generation
+         operator: Gt
+         values: ["4"]
+       - key: topology.kubernetes.io/zone
+         operator: In
+         values: ["us-east-1a", "us-east-1b", "us-east-1c"]
---
# NodePool 2: On-Demand fallback
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: ondemand-fallback
spec:
+ weight: 1  # Only used when spot-workers NodePool cannot schedule
  template:
    spec:
      requirements:
+       - key: karpenter.sh/capacity-type
+         operator: In
+         values: ["on-demand"]
+       - key: karpenter.k8s.aws/instance-category
+         operator: In
+         values: ["m", "c"]
+       - key: topology.kubernetes.io/zone
+         operator: In
+         values: ["us-east-1a", "us-east-1b", "us-east-1c"]

Also verify your EC2NodeClass has correct amiSelectorTerms and subnetSelectorTerms — a misconfigured NodeClass silently eliminates valid instance types before Karpenter even evaluates capacity:

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "my-cluster"  # Must match actual subnet tags
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "my-cluster"

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. OPA/Gatekeeper Policy — Enforce Instance Diversity

Block any NodePool that specifies fewer than 3 instance categories or a single AZ:

package karpenter.nodepool

deny[msg] {
  input.kind == "NodePool"
  reqs := input.spec.template.spec.requirements
  zone_req := reqs[_]
  zone_req.key == "topology.kubernetes.io/zone"
  zone_req.operator == "In"
  count(zone_req.values) < 2
  msg := "NodePool must span at least 2 Availability Zones to tolerate Spot capacity outages."
}

deny[msg] {
  input.kind == "NodePool"
  reqs := input.spec.template.spec.requirements
  type_req := reqs[_]
  type_req.key == "node.kubernetes.io/instance-type"
  type_req.operator == "In"
  count(type_req.values) < 3
  msg := "NodePool pinned to fewer than 3 instance types. Spot interruptions will cause scheduling failures."
}

2. Checkov Custom Check

Add to your Terraform/Helm pipeline:

# checkov custom check: CKV_KARPENTER_001
# Validates NodePool has capacity-type including on-demand fallback OR weight-based fallback pool exists

Run in CI:

checkov -d ./helm/karpenter --framework kubernetes --check CKV_KARPENTER_001

3. CloudWatch Alarm on Pending Pod Duration

Don't wait for user reports. Alert when pods are pending > 5 minutes:

aws cloudwatch put-metric-alarm \
  --alarm-name "karpenter-pending-pods-breach" \
  --metric-name "pending_pods" \
  --namespace "ContainerInsights" \
  --statistic Average \
  --period 300 \
  --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789:oncall-pagerduty

4. Karpenter Spot Interruption Queue

Ensure you have the SQS interruption queue configured. Without it, Karpenter has no advance warning of Spot reclamation and cannot proactively drain nodes before the 2-minute termination window:

# In your Karpenter Helm values
interruptionQueue: "my-cluster-karpenter-interruption-queue"

This alone won't prevent the no instance types available error, but it dramatically reduces the blast radius when Spot IS reclaimed by giving Karpenter time to reschedule before node death.