Initializing Enclave...

How to Fix ECS Unable to Pull ECR Image: Task Role vs Execution Role Explained

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 10 mins

TL;DR

  • What broke: ECS cannot pull your container image from ECR because ecr:GetAuthorizationToken, ecr:BatchGetImage, and ecr:GetDownloadUrlForLayer are either missing entirely or attached to the task role instead of the execution role.
  • How to fix it: Add the AWS-managed policy AmazonEC2ContainerRegistryReadOnly (or a scoped inline equivalent) to the executionRoleArn role, not taskRoleArn.
  • Use our Client-Side Sandbox below to auto-refactor this — paste your task definition or IAM policy and get a corrected execution role policy instantly.

The Incident (What Does the Error Mean?)

Your ECS service is stuck. The task never reaches RUNNING. In the ECS console or CloudWatch logs you see:

CanonicalError: Error response from daemon:
  pull access denied for 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest,
  repository does not exist or may require 'docker login'

ResourceInitializationError: unable to pull secrets or registry auth:
  execution resource retrieval failed: unable to retrieve ecr registry auth:
  service call has been retried 3 time(s):
  RequestError: send request failed
  caused by: Post https://api.ecr.us-east-1.amazonaws.com/: dial tcp: i/o timeout

Immediate consequence: Every task in the service fails at the PROVISIONINGPENDING transition. Zero containers start. Your deployment is fully blocked. If this is a rollback during an outage, you are now down.

The ECS agent running on the host (or Fargate's control plane) calls the ECR API before your container code ever runs. That call is authenticated using the execution role — the IAM role attached to executionRoleArn in your task definition. The task role (taskRoleArn) is scoped to permissions your application code needs at runtime (S3, DynamoDB, Secrets Manager, etc.). These two roles are architecturally distinct and non-interchangeable.


The Attack Vector / Blast Radius

This is not just a broken deployment — the misconfiguration pattern has a broader blast radius:

1. Overpermissioned task role as a workaround. Engineers under pressure add ECR permissions to the task role because it "seems to work" in testing. Now your application container is running with an IAM role that has ecr:* or worse, ecr:GetAuthorizationToken scoped to *. If your application has an SSRF vulnerability or a compromised dependency, an attacker calls the EC2 metadata endpoint (169.254.170.2 on Fargate or 169.254.169.254 on EC2), retrieves the task role credentials, and uses them to pull every image in your ECR registry — including images containing proprietary code, embedded secrets, or base images you haven't patched.

2. Wildcard execution role. The inverse problem: an execution role with ecr:* on Resource: "*" grants the ECS agent permission to pull from any ECR repository in the account, including repositories belonging to other teams or environments. A misconfigured task definition pointing at the wrong image tag now silently pulls from prod.

3. Cascading deployment failure. In an auto-scaling group or during a traffic spike, ECS launches replacement tasks. Every new task fails image pull. Your service scales to zero healthy tasks. ALB health checks fail. You are fully down with no automatic recovery path until the IAM policy is fixed and tasks are force-redeployed.


How to Fix It (The Solution)

Verify the Root Cause First

# Check what execution role your task definition is actually using
aws ecs describe-task-definition \
  --task-definition my-app:latest \
  --query 'taskDefinition.{executionRoleArn:executionRoleArn, taskRoleArn:taskRoleArn}'

# List policies attached to the execution role
aws iam list-attached-role-policies \
  --role-name ecsTaskExecutionRole

# Simulate the exact API call ECS makes
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::123456789012:role/ecsTaskExecutionRole \
  --action-names ecr:GetAuthorizationToken ecr:BatchGetImage ecr:GetDownloadUrlForLayer \
  --resource-arns "*"

Basic Fix: Attach the AWS Managed Policy

- # Execution role has no ECR permissions — or ECR perms are on the task role
- aws iam attach-role-policy \
-   --role-name ecsTaskRole \
-   --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

+ # Attach ECR read access to the EXECUTION role
+ aws iam attach-role-policy \
+   --role-name ecsTaskExecutionRole \
+   --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

The managed policy AmazonEC2ContainerRegistryReadOnly grants exactly: ecr:GetAuthorizationToken, ecr:BatchCheckLayerAvailability, ecr:GetDownloadUrlForLayer, ecr:GetRepositoryPolicy, ecr:DescribeRepositories, ecr:ListImages, ecr:DescribeImages, ecr:BatchGetImage.


Enterprise Best Practice: Scoped Inline Policy

Do not use the managed policy in production. Scope the execution role to your specific repository ARN.

- {
-   "Version": "2012-10-17",
-   "Statement": [
-     {
-       "Effect": "Allow",
-       "Action": "ecr:*",
-       "Resource": "*"
-     }
-   ]
- }

+ {
+   "Version": "2012-10-17",
+   "Statement": [
+     {
+       "Sid": "AllowECRAuthToken",
+       "Effect": "Allow",
+       "Action": "ecr:GetAuthorizationToken",
+       "Resource": "*"
+     },
+     {
+       "Sid": "AllowECRImagePull",
+       "Effect": "Allow",
+       "Action": [
+         "ecr:BatchGetImage",
+         "ecr:GetDownloadUrlForLayer",
+         "ecr:BatchCheckLayerAvailability"
+       ],
+       "Resource": "arn:aws:ecr:us-east-1:123456789012:repository/my-app"
+     }
+   ]
+ }

Note: ecr:GetAuthorizationToken is a regional, account-level API call — it cannot be scoped to a repository ARN. It must remain Resource: "*". This is an AWS API constraint, not a security gap. All other ECR actions should be scoped to the specific repository ARN.

Also verify your task definition references the correct role ARN:

  {
    "family": "my-app",
-   "executionRoleArn": "",
+   "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
    "taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
    ...
  }

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Checkov — catch missing execution role at plan time:

checkov -d ./terraform --check CKV_AWS_97
# CKV_AWS_97: Ensure ECS task definition defines an execution role

2. Terraform: enforce execution role attachment as a required variable:

- resource "aws_ecs_task_definition" "app" {
-   family = "my-app"
-   # execution_role_arn omitted — silent failure at deploy time
- }

+ resource "aws_ecs_task_definition" "app" {
+   family             = "my-app"
+   execution_role_arn = var.execution_role_arn  # enforced, no default
+   task_role_arn      = var.task_role_arn
+ }
+
+ variable "execution_role_arn" {
+   type        = string
+   description = "IAM role ARN for ECS agent image pull and secrets access. Required."
+   # No default — forces explicit declaration in every environment
+ }

3. OPA/Conftest policy — block task definitions with ECR permissions on the task role:

package ecs.task_definition

deny[msg] {
  role := input.taskRoleArn
  statement := input.containerDefinitions[_].environment[_]
  # More precisely: check the IAM policy attached to taskRoleArn
  # In a real pipeline, resolve the role and check attached policies
  contains(role, "ecr")
  msg := sprintf("Task role '%v' should not contain ECR permissions. Use executionRoleArn.", [role])
}

4. AWS Config Rule — continuous compliance:

Enable the managed rule ecs-task-definition-user-for-host-mode and write a custom Config rule that validates executionRoleArn is non-null and that the attached policies include ecr:GetAuthorizationToken for any task definition referencing an ECR image URI.

5. Pre-deploy smoke test in your pipeline:

# Before deploying, simulate the pull permission
EXEC_ROLE=$(aws ecs describe-task-definition \
  --task-definition $TASK_DEF \
  --query 'taskDefinition.executionRoleArn' --output text)

aws iam simulate-principal-policy \
  --policy-source-arn $EXEC_ROLE \
  --action-names ecr:GetAuthorizationToken ecr:BatchGetImage ecr:GetDownloadUrlForLayer \
  --resource-arns "*" \
  --query 'EvaluationResults[?EvalDecision!=`allowed`]' \
  --output text | grep -q . && echo "FAIL: ECR pull will fail" && exit 1

This runs in under 2 seconds and will catch the misconfiguration before a single ECS task is launched.

Related Diagnostics

"Part of the Security Utility Matrix."

View all 140 Security Tools →