Initializing Enclave...

Fixing HPA Custom Metrics Scaling Loop: Debugging HorizontalPodAutoscaler When the Custom Metrics API Goes Dark

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

  • What broke: The HPA controller cannot retrieve custom metrics from the metrics API (Prometheus Adapter or KEDA), so it defaults to <unknown> or 0 for the metric value, triggering repeated scale-to-zero or scale-to-max thrash cycles.
  • How to fix it: Stabilize the HPA with behavior.scaleDown.stabilizationWindowSeconds, fix the broken Prometheus Adapter scrape config or service endpoint, and add minReplicas guards to prevent scale-to-zero.
  • Shortcut: Use our Client-Side Sandbox above to auto-refactor your HPA YAML and Prometheus Adapter ConfigMap without leaking your cluster config to third-party AI logs.

The Incident (What Does the Error Mean?)

Raw output from kubectl describe hpa my-app-hpa -n production:

Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    SucceededGetScale   the HPA controller was able to get the target's current scale
  ScalingActive   False   FailedGetScale      the HPA was unable to compute the replica count: unable to get metric http_requests_per_second: 
                                              unable to fetch metrics from custom metrics API: the server is currently unable to handle the request
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range

Events:
  Warning  FailedGetScale  2m    horizontal-pod-autoscaler  
    failed to get http_requests_per_second: the server is currently unable to handle the request
  Normal   SuccessfulRescale 90s  horizontal-pod-autoscaler  New size: 24; reason: custom metric http_requests_per_second above target
  Normal   SuccessfulRescale 45s  horizontal-pod-autoscaler  New size: 1; reason: custom metric http_requests_per_second below target

Immediate consequence: The HPA oscillates between minReplicas and maxReplicas every 30–90 seconds. During the scale-to-low phase, live traffic gets dropped. During scale-to-max, you're burning node capacity and potentially triggering cluster autoscaler to provision nodes that get torn down 60 seconds later. Your application is effectively down or severely degraded, and your cloud bill is spiking simultaneously.


The Attack Vector / Blast Radius

This isn't a security exploit — it's a cascading infrastructure failure with a wide blast radius:

  1. Metrics API unavailability (Prometheus Adapter pod crash, OOMKill, or custom.metrics.k8s.io APIService degraded) causes the HPA controller to receive HTTP 503s from the aggregated API layer.
  2. The HPA controller interprets a missing metric as 0 in some Kubernetes versions (pre-1.23 behavior) or as <unknown>, both of which trigger the scaling logic incorrectly.
  3. Scale-to-zero thrash kills all running pods. With no minReplicas > 0 guard and no stabilization window, the next controller loop sees traffic queuing and fires a scale-up to maxReplicas.
  4. Rapid pod churn causes readiness probe failures, connection resets to downstream services, and Kubernetes API server load spikes from the volume of pod create/delete events.
  5. If Cluster Autoscaler is active, node groups expand and contract, adding 5–10 minute cold-start latency per cycle and real dollar cost per node-hour provisioned.
  6. Alert fatigue: PagerDuty fires on pod restarts, HPA events, and node scaling simultaneously, masking the single root cause.

The single point of failure: kubectl get apiservice v1beta1.custom.metrics.k8s.io returning False for Available.


How to Fix It (The Solution)

Step 0: Confirm the Root Cause

# Check if the custom metrics APIService is healthy
kubectl get apiservice v1beta1.custom.metrics.k8s.io -o yaml

# Check Prometheus Adapter logs
kubectl logs -n monitoring deploy/prometheus-adapter --tail=100

# Verify the adapter can reach Prometheus
kubectl exec -n monitoring deploy/prometheus-adapter -- \
  wget -qO- http://prometheus-operated:9090/api/v1/query?query=up

Basic Fix: Add HPA Stabilization Windows and minReplicas Guard

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
- minReplicas: 1
+ minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
-       averageValue: 100
+       averageValue: "100"
+ behavior:
+   scaleDown:
+     stabilizationWindowSeconds: 300
+     policies:
+     - type: Percent
+       value: 10
+       periodSeconds: 60
+   scaleUp:
+     stabilizationWindowSeconds: 60
+     policies:
+     - type: Percent
+       value: 50
+       periodSeconds: 60
+     - type: Pods
+       value: 5
+       periodSeconds: 60
+     selectPolicy: Min

Why this works: stabilizationWindowSeconds: 300 on scaleDown means the HPA won't act on a single bad metric reading — it must see consistently low values over 5 minutes. The Percent: 10 policy rate-limits scale-down to 10% of current replicas per minute, preventing cliff-edge drops.


Enterprise Best Practice: Fix the Prometheus Adapter and Add APIService Health Monitoring

Fix the Prometheus Adapter ConfigMap — the actual root cause:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "^(.*)_total$"
        as: "${1}_per_second"
      metricsQuery: |
-       sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)
+       sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)
+       # Increased window reduces sensitivity to Prometheus scrape gaps

Add resource requests/limits to prevent OOMKill on the adapter:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-adapter
  namespace: monitoring
spec:
  template:
    spec:
      containers:
      - name: prometheus-adapter
        image: registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.2
        resources:
-         requests:
-           cpu: 100m
-           memory: 128Mi
+         requests:
+           cpu: 250m
+           memory: 512Mi
+         limits:
+           cpu: 500m
+           memory: 1Gi
+       livenessProbe:
+         httpGet:
+           path: /healthz
+           port: 6443
+           scheme: HTTPS
+         initialDelaySeconds: 30
+         periodSeconds: 10
+       readinessProbe:
+         httpGet:
+           path: /healthz
+           port: 6443
+           scheme: HTTPS
+         initialDelaySeconds: 10
+         periodSeconds: 5

Add a PodDisruptionBudget to prevent the adapter from being evicted during node pressure:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: prometheus-adapter-pdb
  namespace: monitoring
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: prometheus-adapter

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. OPA/Gatekeeper Policy — Enforce HPA Stabilization Windows

package kubernetes.hpa

violation[{"msg": msg}] {
  input.request.kind.kind == "HorizontalPodAutoscaler"
  not input.request.object.spec.behavior.scaleDown.stabilizationWindowSeconds
  msg := "HPA must define behavior.scaleDown.stabilizationWindowSeconds to prevent scaling thrash"
}

violation[{"msg": msg}] {
  input.request.kind.kind == "HorizontalPodAutoscaler"
  input.request.object.spec.minReplicas < 2
  msg := "HPA minReplicas must be >= 2 to prevent scale-to-zero on metrics API failure"
}

2. Checkov Custom Check — Flag Unstabilized HPAs

# .checkov/hpa_stabilization.yaml
id: CKV_K8S_HPA_STABILIZATION
name: "Ensure HPA has scaleDown stabilization window"
path: spec.behavior.scaleDown.stabilizationWindowSeconds
check_type: exists
resource_types:
  - HorizontalPodAutoscaler

3. Prometheus Alert — Fire Before the Loop Starts

groups:
- name: hpa.rules
  rules:
  - alert: HPACustomMetricsAPIDown
    expr: |
      kube_horizontalpodautoscaler_status_condition{
        condition="ScalingActive",
        status="false"
      } == 1
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} cannot retrieve custom metrics"
      runbook_url: "https://your-wiki/runbooks/hpa-custom-metrics-loop"

  - alert: HPAReplicasThrashing
    expr: |
      changes(kube_horizontalpodautoscaler_status_current_replicas[10m]) > 4
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} replica count changed >4 times in 10 minutes"

4. CI Pipeline Gate — Validate APIService Health Pre-Deploy

#!/bin/bash
# deploy-gate.sh — run in CI before applying HPA manifests
set -e

APISTATUS=$(kubectl get apiservice v1beta1.custom.metrics.k8s.io \
  -o jsonpath='{.status.conditions[?(@.type=="Available")].status}')

if [ "$APISTATUS" != "True" ]; then
  echo "FATAL: custom.metrics.k8s.io APIService is not Available. Blocking HPA deployment."
  echo "Run: kubectl describe apiservice v1beta1.custom.metrics.k8s.io"
  exit 1
fi

echo "Custom metrics APIService healthy. Proceeding with HPA apply."

Add this as a pre-deploy step in your ArgoCD pre-sync hook or Tekton pipeline before any kubectl apply targeting HPA resources.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →