Initializing Enclave...

Fixing cert-manager ACME 403 Authorization Error: HTTP-01 and DNS-01 Challenge Debugging Guide

Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

  • What broke: The ACME CA (Let's Encrypt or ZeroSSL) sent an HTTP-01 or DNS-01 challenge to your domain and got a 403 Forbidden — meaning the challenge token endpoint is blocked, the Ingress is misconfigured, or the DNS provider API key is invalid/lacks permissions.
  • How to fix it: Unblock /.well-known/acme-challenge/ on your Ingress/load balancer, verify solver selector labels match your Certificate, and confirm DNS API credentials have write access to the zone.
  • Shortcut: Use our Client-Side Sandbox above to paste your ClusterIssuer and Ingress YAML — it auto-diagnoses the misconfiguration and outputs the corrected diff locally, without sending your configs anywhere.

The Incident (What Does the Error Mean?)

Raw error from kubectl describe certificaterequest or cert-manager controller logs:

E0612 cert-manager/controller/certificaterequests "msg"="Error accepting authorization" 
  "error"="acme: authorization error for domain 'yourdomain.com': 
  403 urn:ietf:params:acme:error:unauthorized :: 
  The client lacks sufficient authorization"

The ACME protocol works by the CA issuing a challenge token your infrastructure must serve (HTTP-01) or publish as a DNS TXT record (DNS-01). A 403 means the CA's validation server reached your endpoint and was explicitly refused — not timed out, not 404'd, refused. Certificate issuance halts completely. If this is a renewal, your existing cert will expire. Ingress traffic will start throwing TLS handshake errors the moment it does.


The Attack Vector / Blast Radius

This is not a theoretical risk. The blast radius is immediate and operational:

  • TLS certificate expires or never issues. Every service behind this Ingress is now either serving a stale cert (browser warnings) or no cert (hard failure for HSTS-pinned clients).
  • HTTP-01 403 root causes cascade: A blanket deny all auth middleware applied at the Ingress controller level (common with oauth2-proxy or basic-auth annotations) will block the ACME validation bot just like any other unauthenticated request. The CA doesn't retry indefinitely — after ~10 failed attempts the authorization object is marked invalid and cert-manager backs off exponentially.
  • DNS-01 403 root causes: Your ExternalDNS or cert-manager webhook (e.g., cert-manager-webhook-cloudflare) is using an API token scoped to read-only, or the token has been rotated and the Secret in-cluster is stale. The DNS provider returns 403, the challenge record never gets created, validation fails.
  • Wildcard certs exclusively use DNS-01. If you're issuing *.yourdomain.com, a broken DNS provider credential takes down wildcard cert issuance across every subdomain simultaneously.

How to Fix It

Diagnosis First — Pull the Actual Challenge State

# Get the Order and Authorization objects — not just the Certificate
kubectl get orders -n <namespace>
kubectl describe order <order-name> -n <namespace>

# Check the Challenge object for the exact solver being used
kubectl get challenges -n <namespace>
kubectl describe challenge <challenge-name> -n <namespace>

The describe challenge output will tell you whether it's HTTP-01 or DNS-01 and the exact URL or DNS record the CA tried to validate.


Fix 1: HTTP-01 — Ingress Auth Middleware Blocking /.well-known/acme-challenge/

The most common cause. An auth annotation on your Ingress applies to all paths, including the ACME challenge path.

 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
   name: my-app-ingress
   annotations:
-    nginx.ingress.kubernetes.io/auth-url: "https://auth.yourdomain.com/oauth2/auth"
-    nginx.ingress.kubernetes.io/auth-signin: "https://auth.yourdomain.com/oauth2/start"
+    # Auth annotations REMOVED from this ingress.
+    # Split the ACME challenge path into a dedicated Ingress resource (see below)
+    nginx.ingress.kubernetes.io/auth-url: "https://auth.yourdomain.com/oauth2/auth"
+    nginx.ingress.kubernetes.io/auth-signin: "https://auth.yourdomain.com/oauth2/start"
 spec:
   rules:
   - host: yourdomain.com
     http:
       paths:
+      # This path must be in a SEPARATE Ingress with NO auth annotations
-      - path: /.well-known/acme-challenge/
-        pathType: Prefix
-        backend:
-          service:
-            name: cm-acme-http-solver-xxxxx
-            port:
-              number: 8089
       - path: /
         pathType: Prefix
         backend:
           service:
             name: my-app
             port:
               number: 80

Create a dedicated, auth-free Ingress for the solver:

+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: cm-acme-challenge
+  namespace: <namespace>
+  # NO auth annotations here — intentionally bare
+spec:
+  ingressClassName: nginx
+  rules:
+  - host: yourdomain.com
+    http:
+      paths:
+      - path: /.well-known/acme-challenge/
+        pathType: Prefix
+        backend:
+          service:
+            name: cm-acme-http-solver-xxxxx  # get exact name from: kubectl get svc -n <ns>
+            port:
+              number: 8089

Fix 2: HTTP-01 — Solver Selector Mismatch in ClusterIssuer

If your ClusterIssuer solver has a selector with matchLabels, it will only handle Certificate objects whose Ingress has those labels. A mismatch means no solver picks up the challenge.

 apiVersion: cert-manager.io/v1
 kind: ClusterIssuer
 metadata:
   name: letsencrypt-prod
 spec:
   acme:
     server: https://acme-v02.api.letsencrypt.org/directory
     email: [email protected]
     privateKeySecretRef:
       name: letsencrypt-prod-key
     solvers:
     - http01:
         ingress:
           class: nginx
-      selector:
-        matchLabels:
-          use-http01-solver: "true"   # This label must exist on your Certificate/Ingress
+      # Remove selector entirely to apply this solver to ALL domains,
+      # or ensure the label exists on every Certificate resource

Fix 3: DNS-01 — Stale or Insufficient API Token (Cloudflare Example)

 # Step 1: Verify the secret value is current
 kubectl get secret cloudflare-api-token -n cert-manager -o jsonpath='{.data.api-token}' | base64 -d

 # Step 2: Patch with the correct token
 kubectl create secret generic cloudflare-api-token \
   --from-literal=api-token=<NEW_TOKEN> \
   -n cert-manager \
   --dry-run=client -o yaml | kubectl apply -f -
 # Cloudflare API token MUST have these permissions — not read-only:
 # Zone > Zone > Read
+# Zone > DNS > Edit   <-- THIS is what was missing
 # All zones or specific zone scoping
 apiVersion: cert-manager.io/v1
 kind: ClusterIssuer
 spec:
   acme:
     solvers:
     - dns01:
         cloudflare:
-          apiKeySecretRef:          # Legacy — Global API Key, avoid this
-            name: cloudflare-api-key
-            key: api-key
+          apiTokenSecretRef:        # Scoped API Token — use this
+            name: cloudflare-api-token
+            key: api-token

Enterprise Best Practice

  • Use DNS-01 with scoped provider tokens stored in Vault or AWS Secrets Manager, synced via External Secrets Operator. Rotate on a 90-day schedule.
  • Never use Global API Keys for Cloudflare/Route53 — scope tokens to the exact hosted zone.
  • Deploy cert-manager with a dedicated ServiceAccount under RBAC least-privilege. The solver pods only need to create/delete Challenge and Ingress resources in their own namespace.
  • Set renewBefore: 720h (30 days) on all Certificate objects to give yourself enough runway to catch renewal failures before expiry.

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Validate ClusterIssuer and Certificate manifests pre-deploy with Datree or Kubeconform:

kubeconform -schema-location 'https://raw.githubusercontent.com/datreeio/CRDs-catalog/main/{{.Group}}/{{.ResourceKind}}_{{.ResourceAPIVersion}}.json' ./manifests/

2. OPA/Gatekeeper policy — enforce that no Ingress with auth annotations covers the ACME challenge path:

package ingress.acme

violation[{"msg": msg}] {
  ingress := input.review.object
  annotations := ingress.metadata.annotations
  annotations["nginx.ingress.kubernetes.io/auth-url"]
  paths := ingress.spec.rules[_].http.paths[_].path
  startswith(paths, "/.well-known/acme-challenge")
  msg := "ACME challenge path must not be on an Ingress with auth annotations."
}

3. Alert on CertificateRequest failures in your observability stack:

# Prometheus alerting rule
- alert: CertManagerAuthorizationFailure
  expr: increase(certmanager_http_acme_client_request_errors_total[10m]) > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "cert-manager ACME authorization failure detected"
    description: "Check CertificateRequest and Challenge objects immediately."

4. Checkov scan on Ingress manifests to flag missing ingressClassName (which causes solver pods to not be picked up by the correct controller):

checkov -f ingress.yaml --check CKV_K8S_21

5. Pin cert-manager version in Helm values and gate upgrades through a staging cluster — cert-manager webhook breaking changes have historically caused solver registration failures on minor version bumps.

Related Diagnostics

"Part of the Security Utility Matrix."

View all 140 Security Tools →