Fix ExternalDNS 'Failed to Sync DNS Records': Route53 IAM Permission Error Resolved
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 10–20 mins
TL;DR
- What broke: ExternalDNS pod cannot write Route53 records because its IAM role is missing
route53:ChangeResourceRecordSetsorroute53:ListHostedZones, or the policy lacks the correct hosted zone ARN scope. - How to fix it: Attach a least-privilege IAM policy granting exactly
ChangeResourceRecordSets,ListHostedZones, andListResourceRecordSets, scoped to the target hosted zone ID. - Use our Client-Side Sandbox below to paste your failing IAM policy and auto-refactor it to the correct scoped statement.
The Incident (What Does the Error Mean?)
Raw log output from kubectl logs -n kube-system deploy/external-dns:
time="2024-05-10T03:12:44Z" level=error msg="failed to sync DNS records" error="AccessDenied: User: arn:aws:sts::123456789012:assumed-role/external-dns-role/i-0abc123 is not authorized to perform: route53:ChangeResourceRecordSets on resource: arn:aws:route53:::hostedzone/Z0123456ABCDEFGHIJKL"
Immediate consequence: Every DNS record ExternalDNS manages — ingress hostnames, service LoadBalancer endpoints — stops updating. New deployments get no DNS entry. Existing records go stale if TTLs expire. In a blue/green or canary setup, this is a silent traffic black hole.
The Attack Vector / Blast Radius
This is a broken access control misconfiguration, not a breach — but the blast radius is severe:
- All ingress hostnames stop resolving for newly deployed services. Users hit DNS NXDOMAIN or stale IPs pointing at decommissioned load balancers.
- If the fix is applied carelessly — granting
route53:*on*— you've now given the ExternalDNS pod full Route53 write access across every hosted zone in the account. A compromised pod (via SSRF, RCE, or a malicious container image) can delete or hijack any domain in your AWS account, including production apex domains. - In multi-tenant clusters, this is a privilege escalation path: any workload that can assume or laterally move to the ExternalDNS service account can rewrite DNS for the entire organization.
The over-permissive "quick fix" is the actual vulnerability.
How to Fix It
Basic Fix — Scoped IAM Policy
Replace the missing or wildcard policy with this. Substitute Z0123456ABCDEFGHIJKL with your actual hosted zone ID.
{
"Version": "2012-10-17",
"Statement": [
- {
- "Effect": "Allow",
- "Action": "route53:*",
- "Resource": "*"
- }
+ {
+ "Effect": "Allow",
+ "Action": [
+ "route53:ChangeResourceRecordSets"
+ ],
+ "Resource": "arn:aws:route53:::hostedzone/Z0123456ABCDEFGHIJKL"
+ },
+ {
+ "Effect": "Allow",
+ "Action": [
+ "route53:ListHostedZones",
+ "route53:ListResourceRecordSets",
+ "route53:ListTagsForResource"
+ ],
+ "Resource": "*"
+ }
]
}
ListHostedZonesandListResourceRecordSetscannot be scoped to a single zone ARN — AWS requires*for these list actions.ChangeResourceRecordSetsmust be scoped to the specific zone.
Enterprise Best Practice — IRSA + Zone-Locked Policy
Never use node instance profiles for ExternalDNS. Use IAM Roles for Service Accounts (IRSA) with an OIDC trust policy.
Step 1: Annotate the Kubernetes ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: external-dns
namespace: kube-system
annotations:
- # missing annotation — falling back to node instance profile
+ eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/external-dns-irsa-role
Step 2: IAM Trust Policy (OIDC-bound)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
- "Service": "ec2.amazonaws.com"
+ "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"
},
"Action": "sts:AssumeRoleWithWebIdentity",
+ "Condition": {
+ "StringEquals": {
+ "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:kube-system:external-dns",
+ "oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:aud": "sts.amazonaws.com"
+ }
+ }
}
]
}
The StringEquals condition ensures only the external-dns service account in kube-system can assume this role. No other pod, even on the same node, can use it.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Checkov — Block wildcard Route53 actions at PR time
Add to your .checkov.yaml:
checks:
- CKV_AWS_111 # Ensure IAM policies do not allow write access without constraints
- CKV_AWS_290 # Ensure IAM policies do not allow route53:* on *
2. OPA/Conftest — Enforce zone-scoped policies
package iam.route53
deny[msg] {
stmt := input.Statement[_]
stmt.Effect == "Allow"
stmt.Action[_] == "route53:ChangeResourceRecordSets"
stmt.Resource == "*"
msg := "route53:ChangeResourceRecordSets must be scoped to a specific hosted zone ARN, not '*'"
}
Run in CI: conftest test iam-policy.json --policy route53.rego
3. Terraform — Use aws_iam_policy with explicit zone ARN interpolation
resource "aws_iam_policy" "external_dns" {
name = "external-dns-route53"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = ["route53:ChangeResourceRecordSets"]
# Interpolate zone ID — never hardcode or use "*"
Resource = "arn:aws:route53:::hostedzone/${var.route53_zone_id}"
},
{
Effect = "Allow"
Action = ["route53:ListHostedZones", "route53:ListResourceRecordSets", "route53:ListTagsForResource"]
Resource = "*"
}
]
})
}
4. Verify IRSA wiring before deploying:
# Confirm token projection is working
kubectl exec -n kube-system deploy/external-dns -- \
cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token | cut -d. -f2 | base64 -d | jq .sub
# Expected: system:serviceaccount:kube-system:external-dns
# Dry-run sync to validate permissions without making changes
externaldns --dry-run --provider=aws --domain-filter=yourdomain.com