Fixing HPA Custom Metrics Scaling Loop: Debugging HorizontalPodAutoscaler When the Custom Metrics API Goes Dark
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–30 mins
TL;DR
- What broke: The HPA controller cannot retrieve custom metrics from the metrics API (Prometheus Adapter or KEDA), so it defaults to
<unknown>or0for the metric value, triggering repeated scale-to-zero or scale-to-max thrash cycles. - How to fix it: Stabilize the HPA with
behavior.scaleDown.stabilizationWindowSeconds, fix the broken Prometheus Adapter scrape config or service endpoint, and addminReplicasguards to prevent scale-to-zero. - Shortcut: Use our Client-Side Sandbox above to auto-refactor your HPA YAML and Prometheus Adapter ConfigMap without leaking your cluster config to third-party AI logs.
The Incident (What Does the Error Mean?)
Raw output from kubectl describe hpa my-app-hpa -n production:
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetScale the HPA was unable to compute the replica count: unable to get metric http_requests_per_second:
unable to fetch metrics from custom metrics API: the server is currently unable to handle the request
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Warning FailedGetScale 2m horizontal-pod-autoscaler
failed to get http_requests_per_second: the server is currently unable to handle the request
Normal SuccessfulRescale 90s horizontal-pod-autoscaler New size: 24; reason: custom metric http_requests_per_second above target
Normal SuccessfulRescale 45s horizontal-pod-autoscaler New size: 1; reason: custom metric http_requests_per_second below target
Immediate consequence: The HPA oscillates between minReplicas and maxReplicas every 30–90 seconds. During the scale-to-low phase, live traffic gets dropped. During scale-to-max, you're burning node capacity and potentially triggering cluster autoscaler to provision nodes that get torn down 60 seconds later. Your application is effectively down or severely degraded, and your cloud bill is spiking simultaneously.
The Attack Vector / Blast Radius
This isn't a security exploit — it's a cascading infrastructure failure with a wide blast radius:
- Metrics API unavailability (Prometheus Adapter pod crash, OOMKill, or
custom.metrics.k8s.ioAPIService degraded) causes the HPA controller to receive HTTP 503s from the aggregated API layer. - The HPA controller interprets a missing metric as
0in some Kubernetes versions (pre-1.23 behavior) or as<unknown>, both of which trigger the scaling logic incorrectly. - Scale-to-zero thrash kills all running pods. With no
minReplicas > 0guard and no stabilization window, the next controller loop sees traffic queuing and fires a scale-up tomaxReplicas. - Rapid pod churn causes readiness probe failures, connection resets to downstream services, and Kubernetes API server load spikes from the volume of pod create/delete events.
- If Cluster Autoscaler is active, node groups expand and contract, adding 5–10 minute cold-start latency per cycle and real dollar cost per node-hour provisioned.
- Alert fatigue: PagerDuty fires on pod restarts, HPA events, and node scaling simultaneously, masking the single root cause.
The single point of failure: kubectl get apiservice v1beta1.custom.metrics.k8s.io returning False for Available.
How to Fix It (The Solution)
Step 0: Confirm the Root Cause
# Check if the custom metrics APIService is healthy
kubectl get apiservice v1beta1.custom.metrics.k8s.io -o yaml
# Check Prometheus Adapter logs
kubectl logs -n monitoring deploy/prometheus-adapter --tail=100
# Verify the adapter can reach Prometheus
kubectl exec -n monitoring deploy/prometheus-adapter -- \
wget -qO- http://prometheus-operated:9090/api/v1/query?query=up
Basic Fix: Add HPA Stabilization Windows and minReplicas Guard
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
- minReplicas: 1
+ minReplicas: 3
maxReplicas: 50
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
- averageValue: 100
+ averageValue: "100"
+ behavior:
+ scaleDown:
+ stabilizationWindowSeconds: 300
+ policies:
+ - type: Percent
+ value: 10
+ periodSeconds: 60
+ scaleUp:
+ stabilizationWindowSeconds: 60
+ policies:
+ - type: Percent
+ value: 50
+ periodSeconds: 60
+ - type: Pods
+ value: 5
+ periodSeconds: 60
+ selectPolicy: Min
Why this works: stabilizationWindowSeconds: 300 on scaleDown means the HPA won't act on a single bad metric reading — it must see consistently low values over 5 minutes. The Percent: 10 policy rate-limits scale-down to 10% of current replicas per minute, preventing cliff-edge drops.
Enterprise Best Practice: Fix the Prometheus Adapter and Add APIService Health Monitoring
Fix the Prometheus Adapter ConfigMap — the actual root cause:
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-adapter-config
namespace: monitoring
data:
config.yaml: |
rules:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: |
- sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)
+ sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)
+ # Increased window reduces sensitivity to Prometheus scrape gaps
Add resource requests/limits to prevent OOMKill on the adapter:
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-adapter
namespace: monitoring
spec:
template:
spec:
containers:
- name: prometheus-adapter
image: registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.2
resources:
- requests:
- cpu: 100m
- memory: 128Mi
+ requests:
+ cpu: 250m
+ memory: 512Mi
+ limits:
+ cpu: 500m
+ memory: 1Gi
+ livenessProbe:
+ httpGet:
+ path: /healthz
+ port: 6443
+ scheme: HTTPS
+ initialDelaySeconds: 30
+ periodSeconds: 10
+ readinessProbe:
+ httpGet:
+ path: /healthz
+ port: 6443
+ scheme: HTTPS
+ initialDelaySeconds: 10
+ periodSeconds: 5
Add a PodDisruptionBudget to prevent the adapter from being evicted during node pressure:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: prometheus-adapter-pdb
namespace: monitoring
spec:
minAvailable: 1
selector:
matchLabels:
app: prometheus-adapter
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. OPA/Gatekeeper Policy — Enforce HPA Stabilization Windows
package kubernetes.hpa
violation[{"msg": msg}] {
input.request.kind.kind == "HorizontalPodAutoscaler"
not input.request.object.spec.behavior.scaleDown.stabilizationWindowSeconds
msg := "HPA must define behavior.scaleDown.stabilizationWindowSeconds to prevent scaling thrash"
}
violation[{"msg": msg}] {
input.request.kind.kind == "HorizontalPodAutoscaler"
input.request.object.spec.minReplicas < 2
msg := "HPA minReplicas must be >= 2 to prevent scale-to-zero on metrics API failure"
}
2. Checkov Custom Check — Flag Unstabilized HPAs
# .checkov/hpa_stabilization.yaml
id: CKV_K8S_HPA_STABILIZATION
name: "Ensure HPA has scaleDown stabilization window"
path: spec.behavior.scaleDown.stabilizationWindowSeconds
check_type: exists
resource_types:
- HorizontalPodAutoscaler
3. Prometheus Alert — Fire Before the Loop Starts
groups:
- name: hpa.rules
rules:
- alert: HPACustomMetricsAPIDown
expr: |
kube_horizontalpodautoscaler_status_condition{
condition="ScalingActive",
status="false"
} == 1
for: 2m
labels:
severity: critical
annotations:
summary: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} cannot retrieve custom metrics"
runbook_url: "https://your-wiki/runbooks/hpa-custom-metrics-loop"
- alert: HPAReplicasThrashing
expr: |
changes(kube_horizontalpodautoscaler_status_current_replicas[10m]) > 4
for: 5m
labels:
severity: critical
annotations:
summary: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} replica count changed >4 times in 10 minutes"
4. CI Pipeline Gate — Validate APIService Health Pre-Deploy
#!/bin/bash
# deploy-gate.sh — run in CI before applying HPA manifests
set -e
APISTATUS=$(kubectl get apiservice v1beta1.custom.metrics.k8s.io \
-o jsonpath='{.status.conditions[?(@.type=="Available")].status}')
if [ "$APISTATUS" != "True" ]; then
echo "FATAL: custom.metrics.k8s.io APIService is not Available. Blocking HPA deployment."
echo "Run: kubectl describe apiservice v1beta1.custom.metrics.k8s.io"
exit 1
fi
echo "Custom metrics APIService healthy. Proceeding with HPA apply."
Add this as a pre-deploy step in your ArgoCD pre-sync hook or Tekton pipeline before any kubectl apply targeting HPA resources.