Fixing cert-manager ACME 403 Authorization Error: HTTP-01 and DNS-01 Challenge Debugging Guide
Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins
TL;DR
- What broke: The ACME CA (Let's Encrypt or ZeroSSL) sent an HTTP-01 or DNS-01 challenge to your domain and got a 403 Forbidden — meaning the challenge token endpoint is blocked, the Ingress is misconfigured, or the DNS provider API key is invalid/lacks permissions.
- How to fix it: Unblock
/.well-known/acme-challenge/on your Ingress/load balancer, verify solver selector labels match your Certificate, and confirm DNS API credentials have write access to the zone. - Shortcut: Use our Client-Side Sandbox above to paste your ClusterIssuer and Ingress YAML — it auto-diagnoses the misconfiguration and outputs the corrected diff locally, without sending your configs anywhere.
The Incident (What Does the Error Mean?)
Raw error from kubectl describe certificaterequest or cert-manager controller logs:
E0612 cert-manager/controller/certificaterequests "msg"="Error accepting authorization"
"error"="acme: authorization error for domain 'yourdomain.com':
403 urn:ietf:params:acme:error:unauthorized ::
The client lacks sufficient authorization"
The ACME protocol works by the CA issuing a challenge token your infrastructure must serve (HTTP-01) or publish as a DNS TXT record (DNS-01). A 403 means the CA's validation server reached your endpoint and was explicitly refused — not timed out, not 404'd, refused. Certificate issuance halts completely. If this is a renewal, your existing cert will expire. Ingress traffic will start throwing TLS handshake errors the moment it does.
The Attack Vector / Blast Radius
This is not a theoretical risk. The blast radius is immediate and operational:
- TLS certificate expires or never issues. Every service behind this Ingress is now either serving a stale cert (browser warnings) or no cert (hard failure for HSTS-pinned clients).
- HTTP-01 403 root causes cascade: A blanket
deny allauth middleware applied at the Ingress controller level (common with oauth2-proxy or basic-auth annotations) will block the ACME validation bot just like any other unauthenticated request. The CA doesn't retry indefinitely — after ~10 failed attempts the authorization object is marked invalid and cert-manager backs off exponentially. - DNS-01 403 root causes: Your ExternalDNS or cert-manager webhook (e.g.,
cert-manager-webhook-cloudflare) is using an API token scoped to read-only, or the token has been rotated and the Secret in-cluster is stale. The DNS provider returns 403, the challenge record never gets created, validation fails. - Wildcard certs exclusively use DNS-01. If you're issuing
*.yourdomain.com, a broken DNS provider credential takes down wildcard cert issuance across every subdomain simultaneously.
How to Fix It
Diagnosis First — Pull the Actual Challenge State
# Get the Order and Authorization objects — not just the Certificate
kubectl get orders -n <namespace>
kubectl describe order <order-name> -n <namespace>
# Check the Challenge object for the exact solver being used
kubectl get challenges -n <namespace>
kubectl describe challenge <challenge-name> -n <namespace>
The describe challenge output will tell you whether it's HTTP-01 or DNS-01 and the exact URL or DNS record the CA tried to validate.
Fix 1: HTTP-01 — Ingress Auth Middleware Blocking /.well-known/acme-challenge/
The most common cause. An auth annotation on your Ingress applies to all paths, including the ACME challenge path.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app-ingress
annotations:
- nginx.ingress.kubernetes.io/auth-url: "https://auth.yourdomain.com/oauth2/auth"
- nginx.ingress.kubernetes.io/auth-signin: "https://auth.yourdomain.com/oauth2/start"
+ # Auth annotations REMOVED from this ingress.
+ # Split the ACME challenge path into a dedicated Ingress resource (see below)
+ nginx.ingress.kubernetes.io/auth-url: "https://auth.yourdomain.com/oauth2/auth"
+ nginx.ingress.kubernetes.io/auth-signin: "https://auth.yourdomain.com/oauth2/start"
spec:
rules:
- host: yourdomain.com
http:
paths:
+ # This path must be in a SEPARATE Ingress with NO auth annotations
- - path: /.well-known/acme-challenge/
- pathType: Prefix
- backend:
- service:
- name: cm-acme-http-solver-xxxxx
- port:
- number: 8089
- path: /
pathType: Prefix
backend:
service:
name: my-app
port:
number: 80
Create a dedicated, auth-free Ingress for the solver:
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+ name: cm-acme-challenge
+ namespace: <namespace>
+ # NO auth annotations here — intentionally bare
+spec:
+ ingressClassName: nginx
+ rules:
+ - host: yourdomain.com
+ http:
+ paths:
+ - path: /.well-known/acme-challenge/
+ pathType: Prefix
+ backend:
+ service:
+ name: cm-acme-http-solver-xxxxx # get exact name from: kubectl get svc -n <ns>
+ port:
+ number: 8089
Fix 2: HTTP-01 — Solver Selector Mismatch in ClusterIssuer
If your ClusterIssuer solver has a selector with matchLabels, it will only handle Certificate objects whose Ingress has those labels. A mismatch means no solver picks up the challenge.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: [email protected]
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
class: nginx
- selector:
- matchLabels:
- use-http01-solver: "true" # This label must exist on your Certificate/Ingress
+ # Remove selector entirely to apply this solver to ALL domains,
+ # or ensure the label exists on every Certificate resource
Fix 3: DNS-01 — Stale or Insufficient API Token (Cloudflare Example)
# Step 1: Verify the secret value is current
kubectl get secret cloudflare-api-token -n cert-manager -o jsonpath='{.data.api-token}' | base64 -d
# Step 2: Patch with the correct token
kubectl create secret generic cloudflare-api-token \
--from-literal=api-token=<NEW_TOKEN> \
-n cert-manager \
--dry-run=client -o yaml | kubectl apply -f -
# Cloudflare API token MUST have these permissions — not read-only:
# Zone > Zone > Read
+# Zone > DNS > Edit <-- THIS is what was missing
# All zones or specific zone scoping
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
spec:
acme:
solvers:
- dns01:
cloudflare:
- apiKeySecretRef: # Legacy — Global API Key, avoid this
- name: cloudflare-api-key
- key: api-key
+ apiTokenSecretRef: # Scoped API Token — use this
+ name: cloudflare-api-token
+ key: api-token
Enterprise Best Practice
- Use DNS-01 with scoped provider tokens stored in Vault or AWS Secrets Manager, synced via External Secrets Operator. Rotate on a 90-day schedule.
- Never use Global API Keys for Cloudflare/Route53 — scope tokens to the exact hosted zone.
- Deploy cert-manager with a dedicated ServiceAccount under RBAC least-privilege. The solver pods only need to create/delete Challenge and Ingress resources in their own namespace.
- Set
renewBefore: 720h(30 days) on all Certificate objects to give yourself enough runway to catch renewal failures before expiry.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Validate ClusterIssuer and Certificate manifests pre-deploy with Datree or Kubeconform:
kubeconform -schema-location 'https://raw.githubusercontent.com/datreeio/CRDs-catalog/main/{{.Group}}/{{.ResourceKind}}_{{.ResourceAPIVersion}}.json' ./manifests/
2. OPA/Gatekeeper policy — enforce that no Ingress with auth annotations covers the ACME challenge path:
package ingress.acme
violation[{"msg": msg}] {
ingress := input.review.object
annotations := ingress.metadata.annotations
annotations["nginx.ingress.kubernetes.io/auth-url"]
paths := ingress.spec.rules[_].http.paths[_].path
startswith(paths, "/.well-known/acme-challenge")
msg := "ACME challenge path must not be on an Ingress with auth annotations."
}
3. Alert on CertificateRequest failures in your observability stack:
# Prometheus alerting rule
- alert: CertManagerAuthorizationFailure
expr: increase(certmanager_http_acme_client_request_errors_total[10m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "cert-manager ACME authorization failure detected"
description: "Check CertificateRequest and Challenge objects immediately."
4. Checkov scan on Ingress manifests to flag missing ingressClassName (which causes solver pods to not be picked up by the correct controller):
checkov -f ingress.yaml --check CKV_K8S_21
5. Pin cert-manager version in Helm values and gate upgrades through a staging cluster — cert-manager webhook breaking changes have historically caused solver registration failures on minor version bumps.