Why does KEDA scale my deployment to zero when Prometheus is down?

If no `fallback` block is defined in your ScaledObject, KEDA cannot determine the desired replica count when metrics are unavailable. Depending on the KEDA version and HPA behavior, this can result in the HPA receiving a zero or stale metric value, driving replicas to zero. Always define `fallback.failureThreshold` and `fallback.replicas` to set a safe minimum replica floor during metric outages.

What is the correct serverAddress format for Prometheus in a multi-namespace KEDA setup?

Use the fully qualified in-cluster DNS name: `http://prometheus-server. .svc.cluster.local:9090`. Short hostnames like `prometheus` or `prometheus-server` only resolve reliably within the same namespace. KEDA's operator pod runs in the `keda` namespace and requires the full FQDN to resolve services in other namespaces like `monitoring`.

How do I verify that the KEDA operator can actually reach my Prometheus endpoint?

Run: `kubectl exec -n keda deployment/keda-operator -- wget -qO- 'http://prometheus-server.monitoring.svc.cluster.local:9090/api/v1/query?query=up'`. A successful JSON response confirms connectivity. A DNS error or connection refused points to a NetworkPolicy blocking ingress to Prometheus from the `keda` namespace, or an incorrect service name/port.

Fixing KEDA 'scaledobject failed to get metrics' When Prometheus Is Down or Unreachable

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

What broke: KEDA's Prometheus scaler cannot reach serverAddress, causing ScaledObject to enter a failed metrics state — replicas freeze at last known count or scale to zero depending on fallback config.
How to fix it: Verify Prometheus is reachable from the KEDA operator pod, fix the serverAddress URL, confirm the PromQL query returns data, and configure a fallback replica floor.
Use our Client-Side Sandbox above to paste your failing ScaledObject YAML and auto-generate a hardened, refactored version without leaking your cluster config to third-party AI backends.

The Incident (What Does the Error Mean?)

You will see this in kubectl describe scaledobject <name> -n <namespace> or in KEDA operator logs:

ERROR   scale_handler   failed to get metrics for scaledObject
{
  "scaledObject": "my-app/my-scaledobject",
  "error": "error getting metric source: unable to get external metric
            my-app-prometheus: unable to fetch metrics from prometheus:
            Get \"http://prometheus-server.monitoring.svc:9090/api/v1/query\":
            dial tcp: lookup prometheus-server.monitoring.svc: no such host"
}

Immediate consequence: KEDA cannot evaluate the trigger. The HPA backing the ScaledObject receives no metric update. Depending on your fallback block, your deployment either freezes at current replicas or scales to zero — both are production-breaking outcomes.

The Attack Vector / Blast Radius

This is not a security exploit — it is a cascading availability failure:

Prometheus pod crash / OOMKilled / evicted → KEDA loses its only metric source.
ScaledObject enters Failed condition → Kubernetes HPA stops receiving external metrics.
Without a fallback block, KEDA defaults to 0 desired replicas on some versions — your entire workload scales to zero under load.
With a poorly set fallback, replicas freeze at a stale count — your service either under-provisions during a traffic spike or burns compute during idle.
Network policy misconfiguration is the second most common cause: the KEDA operator namespace cannot reach the Prometheus namespace on port 9090, silently failing DNS resolution or TCP connection.

Blast radius: Full autoscaling blindness. Every ScaledObject using that Prometheus endpoint is affected simultaneously.

How to Fix It

Step 0 — Confirm the actual failure point

# Check ScaledObject conditions
kubectl describe scaledobject my-scaledobject -n my-app

# Check KEDA operator logs directly
kubectl logs -n keda deployment/keda-operator --since=10m | grep -i "prometheus\|failed to get"

# Test reachability FROM the KEDA operator pod
kubectl exec -n keda deployment/keda-operator -- \
  wget -qO- http://prometheus-server.monitoring.svc.cluster.local:9090/api/v1/query?query=up

If that wget fails, the problem is network/DNS, not KEDA config.

Basic Fix — Correct the serverAddress and add fallback

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: my-scaledobject
  namespace: my-app
spec:
  scaleTargetRef:
    name: my-deployment
  minReplicaCount: 1
  maxReplicaCount: 20
+ fallback:
+   failureThreshold: 3
+   replicas: 5
  triggers:
    - type: prometheus
      metadata:
-       serverAddress: http://prometheus:9090
+       serverAddress: http://prometheus-server.monitoring.svc.cluster.local:9090
-       query: http_requests_total
+       query: sum(rate(http_requests_total{namespace="my-app"}[2m]))
        threshold: "100"
+       ignoreNullValues: "false"

Key changes:

serverAddress must use the fully qualified in-cluster DNS name (<service>.<namespace>.svc.cluster.local). Short names fail across namespaces.
query must be a valid PromQL expression that returns a scalar or single-value instant vector. Bare metric names without aggregation will return multi-series results and cause parsing failures.
fallback.replicas: 5 ensures your workload holds a safe floor when Prometheus is unreachable instead of collapsing to zero.

Enterprise Best Practice — TriggerAuthentication + NetworkPolicy + Fallback

# 1. If Prometheus has auth enabled, use TriggerAuthentication
+apiVersion: keda.sh/v1alpha1
+kind: TriggerAuthentication
+metadata:
+  name: prometheus-auth
+  namespace: my-app
+spec:
+  secretTargetRef:
+    - parameter: bearerToken
+      name: prometheus-bearer-secret
+      key: token

# 2. ScaledObject referencing auth + hardened query
 apiVersion: keda.sh/v1alpha1
 kind: ScaledObject
 metadata:
   name: my-scaledobject
   namespace: my-app
 spec:
   scaleTargetRef:
     name: my-deployment
   minReplicaCount: 2
   maxReplicaCount: 50
+  pollingInterval: 30
+  cooldownPeriod: 60
+  fallback:
+    failureThreshold: 3
+    replicas: 10
   triggers:
     - type: prometheus
       metadata:
-        serverAddress: http://prometheus:9090
+        serverAddress: https://prometheus-server.monitoring.svc.cluster.local:9090
+        query: sum(rate(http_requests_total{job="my-app",namespace="my-app"}[2m]))
         threshold: "100"
+        ignoreNullValues: "false"
+      authenticationRef:
+        name: prometheus-auth

# 3. NetworkPolicy — allow KEDA operator egress to Prometheus
+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy
+metadata:
+  name: allow-keda-to-prometheus
+  namespace: monitoring
+spec:
+  podSelector:
+    matchLabels:
+      app: prometheus
+  ingress:
+    - from:
+        - namespaceSelector:
+            matchLabels:
+              kubernetes.io/metadata.name: keda
+      ports:
+        - protocol: TCP
+          port: 9090

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Validate ScaledObject manifests pre-deploy with Conftest/OPA

# policy/keda_scaledobject.rego
package keda

deny[msg] {
  input.kind == "ScaledObject"
  not input.spec.fallback
  msg := "ScaledObject must define a fallback block to prevent scale-to-zero on metric failure"
}

deny[msg] {
  input.kind == "ScaledObject"
  trigger := input.spec.triggers[_]
  trigger.type == "prometheus"
  not contains(trigger.metadata.serverAddress, ".svc.cluster.local")
  msg := "Prometheus serverAddress must use fully qualified cluster DNS to ensure cross-namespace resolution"
}

# Run in CI pipeline
conftest test scaledobject.yaml --policy policy/

2. Checkov custom check for KEDA

Checkov does not natively parse KEDA CRDs, but you can add a custom check using checkov --check CKV_CUSTOM_KEDA_001 pattern with the Python SDK targeting spec.fallback presence.

3. Prometheus Alerting — alert before KEDA notices

# Alert fires if Prometheus itself is the problem
- alert: PrometheusTargetDown
  expr: up{job="keda-metrics-source"} == 0
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "Prometheus target used by KEDA is down — autoscaling blind"

4. Helm/ArgoCD pre-sync hook

Add a helm test or ArgoCD PreSync hook that runs a wget or curl against the serverAddress from within the target namespace before deploying ScaledObject manifests. Fail the sync if the endpoint is unreachable.

Bottom line: KEDA trusts that your metric backend is alive. It has no circuit breaker by default. You must build the safety net yourself via fallback, network policies, and CI-time policy enforcement.