Fixing Linkerd 'Proxy Injection Failed' Identity Trust Domain Mismatch in Production
Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins
TL;DR
- What broke: The
identityTrustDomainset in the Linkerd control plane (e.g.,cluster.local) does not match the SPIFFE trust domain embedded in the identity issuer certificate (ca.crt), causing the proxy injector webhook to reject all annotated pods. - How to fix it: Re-align the
--identity-trust-domainHelm value with the SAN URI in your root CA cert, then re-roll affected workloads. If the cert was generated with the wrong domain, you must rotate it. - Shortcut: Use our Client-Side Sandbox above to paste your
linkerd-configConfigMap and CA cert PEM — it will auto-diff the mismatch and generate the corrected Helm override without sending your certs anywhere.
The Incident (What Does the Error Mean?)
Raw error from kubectl describe pod or the injector webhook logs:
Error from server: error when creating "deploy.yaml":
admission webhook "linkerd-proxy-injector.linkerd.io" denied the request:
proxy injection failed: identity trust domain mismatch:
issuer has domain "prod.example.com" but control plane expects "cluster.local"
Or from linkerd check:
× issuer cert is signed by the trust anchor
issuer certificate is not signed by any of the trust anchors
see https://linkerd.io/2/checks/#l5d-identity-issuer-cert-signed-by-trust-anchor
Immediate consequence: The proxy injector webhook — a validating/mutating admission controller — hard-blocks pod scheduling for every namespace with linkerd.io/inject: enabled. Your deployment rolls out zero replicas. In an existing cluster mid-upgrade, running pods lose the ability to renew their SPIFFE SVIDs, causing mTLS session failures within the SVID TTL window (default: 24h).
The Attack Vector / Blast Radius
This isn't just an ops nuisance — it's a mesh-wide identity collapse.
Why it's dangerous:
mTLS falls back silently in some configurations. If
proxy.defaultInboundPolicyis set toall-unauthenticated, workloads that were injected before the mismatch continue running but with no mutual TLS. An attacker with network access to the pod CIDR can intercept east-west traffic in plaintext.SPIFFE SVID chaining breaks. Linkerd's identity service issues X.509 SVIDs scoped to the trust domain (e.g.,
spiffe://cluster.local/ns/default/sa/myapp). A mismatched domain means the proxy cannot validate peer certificates against the trust bundle — all peer authentication policies evaluate to DENY or skip, depending on yourServerandAuthorizationPolicyresources.Blast radius on upgrade: This most commonly surfaces during
linkerd upgradewhen a custom CA was generated with a hardcoded domain and the new Helm chart defaults differ. Every namespace with injection enabled is simultaneously affected. Rollback requires cert rotation, not just a Helm rollback.Audit gap: Because pods fail at admission, no workload logs are generated — the failure is invisible to application-level alerting. Only webhook audit logs or
linkerd checkcatches it.
How to Fix It
Step 1: Confirm the Mismatch
# Extract the trust domain the control plane expects
kubectl -n linkerd get cm linkerd-config -o jsonpath='{.data.values}' | \
python3 -c "import sys,json; v=json.load(sys.stdin); print(v['identityTrustDomain'])"
# Extract the trust domain burned into the issuer cert
kubectl -n linkerd get secret linkerd-identity-issuer \
-o jsonpath='{.data.crt\.pem}' | base64 -d | \
openssl x509 -noout -text | grep -A1 "Subject Alternative Name"
You will see something like:
# ConfigMap says: cluster.local
# Cert SAN says: URI:spiffe://prod.example.com
That delta is your outage.
Basic Fix — Align Helm Value to Existing Cert
If the cert was intentionally generated with prod.example.com and the Helm value is wrong:
# linkerd-values-override.yaml
identity:
issuer:
scheme: kubernetes.io/tls
- identityTrustDomain: cluster.local
+ identityTrustDomain: prod.example.com
helm upgrade linkerd-control-plane linkerd/linkerd-control-plane \
-n linkerd \
-f linkerd-values-override.yaml \
--reuse-values
# Force restart the injector and identity controller
kubectl -n linkerd rollout restart deploy/linkerd-proxy-injector
kubectl -n linkerd rollout restart deploy/linkerd-identity
# Validate
linkerd check
Enterprise Best Practice — Rotate the CA to Match Cluster Convention
If you're standardizing on cluster.local (recommended for portability) and the cert is wrong, rotate the trust anchor. Do not skip the step-cli verification.
# Generate new root CA with correct trust domain
step certificate create root.linkerd.cluster.local ca.crt ca.key \
--profile root-ca \
--no-password \
--insecure \
--san "root.linkerd.cluster.local"
# Generate issuer cert signed by new root
step certificate create identity.linkerd.cluster.local issuer.crt issuer.key \
--profile intermediate-ca \
--not-after 8760h \
--no-password \
--insecure \
--ca ca.crt \
--ca-key ca.key
# Helm upgrade with explicit cert injection
identity:
issuer:
scheme: kubernetes.io/tls
+ identityTrustDomain: cluster.local
+ identityTrustAnchorsPEM: |
+ <contents of ca.crt>
helm upgrade linkerd-control-plane linkerd/linkerd-control-plane \
-n linkerd \
--set-file identityTrustAnchorsPEM=ca.crt \
--set identity.issuer.tls.crtPEM="$(cat issuer.crt)" \
--set identity.issuer.tls.keyPEM="$(cat issuer.key)" \
--set identityTrustDomain=cluster.local \
--reuse-values
# Re-roll ALL injected workloads to get new SVIDs
for ns in $(kubectl get ns -o jsonpath='{.items[*].metadata.name}'); do
kubectl -n $ns rollout restart deploy 2>/dev/null
done
linkerd check --proxy
⚠️ During the rotation window, pods with old SVIDs (signed by the old CA) and pods with new SVIDs cannot mutually authenticate. Schedule this in a maintenance window or use Linkerd's trust anchor rotation procedure which supports a dual-trust-anchor bundle.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Pre-Flight linkerd check in Your Deploy Pipeline
# .github/workflows/deploy.yaml (or equivalent)
- name: Linkerd pre-flight check
run: |
linkerd check --pre 2>&1 | tee /tmp/linkerd-check.log
if grep -q "×" /tmp/linkerd-check.log; then
echo "Linkerd control plane check failed. Blocking deploy."
exit 1
fi
2. OPA/Gatekeeper Policy — Enforce Trust Domain Annotation Consistency
# opa-linkerd-trustdomain.rego
package linkerd.trustdomain
violation[{"msg": msg}] {
input.review.object.kind == "ConfigMap"
input.review.object.metadata.name == "linkerd-config"
input.review.object.metadata.namespace == "linkerd"
domain := input.review.object.data.values
not contains(domain, "identityTrustDomain\":\"cluster.local")
msg := sprintf("linkerd-config identityTrustDomain must be cluster.local, got: %v", [domain])
}
3. cert-manager + Trust Domain Pinning
Use cert-manager with a ClusterIssuer to enforce the correct SAN on every generated cert:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: linkerd-identity-issuer
namespace: linkerd
spec:
secretName: linkerd-identity-issuer
duration: 8760h
renewBefore: 720h
isCA: true
privateKey:
algorithm: ECDSA
dnsNames:
- identity.linkerd.cluster.local
uris:
- spiffe://cluster.local # <-- THIS must match identityTrustDomain
issuerRef:
name: linkerd-trust-anchor
kind: ClusterIssuer
4. Checkov / Helm Chart Linting
# Render the chart and scan for trust domain consistency
helm template linkerd-control-plane linkerd/linkerd-control-plane \
-f values.yaml > rendered.yaml
checkov -f rendered.yaml --check CKV2_K8S_6
# Custom script: cross-check rendered trust domain vs. CA cert SAN
python3 scripts/validate_linkerd_trust_domain.py rendered.yaml ca.crt
Pin this validation in your Helm pre-upgrade hook and your GitOps reconciliation loop (Flux/ArgoCD pre-sync hook). A 30-second check here prevents a 30-minute outage.