Fixing AWS STS GetCallerIdentity 'Invalid Security Token' Error for Temporary Credentials
Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 5–15 mins
TL;DR
- What broke:
sts:GetCallerIdentityrejected your temporary credentials because theX-Amz-Security-Tokenis expired, malformed, revoked, or the system clock is skewed beyond AWS's 5-minute tolerance. - How to fix it: Reissue the temporary credentials via
sts:AssumeRoleor refresh the instance/ECS/EKS metadata endpoint; verify NTP sync; confirm the session token is passed as the third credential parameter, not omitted. - Shortcut: Use our Client-Side Sandbox above to auto-refactor your credential initialization code — no data leaves your browser.
The Incident (What Does the Error Mean?)
Raw error output from AWS CLI or SDK:
An error occurred (InvalidClientTokenId) when calling the GetCallerIdentity operation:
The request was rejected because the security token is invalid.
or via Boto3/SDK:
botocore.exceptions.ClientError: An error occurred (InvalidClientTokenId):
The security token included in the request is invalid.
Immediate consequence: Every downstream call gated on identity verification fails. This includes service-to-service auth handshakes, Vault AWS auth backends, IRSA token validation in EKS, and any CI/CD pipeline step that assumes a role before deploying. The pipeline is dead until credentials are valid.
The Attack Vector / Blast Radius
This error is not just an annoyance — it signals a credential lifecycle failure that has real security implications:
- If the token was revoked: Someone with IAM write access may have called
sts:RevokeSessionor deleted the underlying IAM role trust policy. Treat this as a potential credential compromise event until confirmed otherwise. Check CloudTrail immediately forDeleteRole,UpdateAssumeRolePolicy, orRevokeSessionevents. - If the token is expired and your code silently retries with stale credentials: Depending on your SDK retry logic, you may be logging the full
AWS_SESSION_TOKENvalue in retry error traces — exposing a still-valid-until-recently token in plaintext logs. - Clock skew >5 minutes: AWS SigV4 signing rejects requests outside a ±5-minute window. On EC2 instances with
chronymisconfigured, or in air-gapped environments, this silently breaks all IAM auth. An attacker who can manipulate the system clock on a compromised host can deliberately push it outside the window to deny service. - Blast radius: Any service using this identity — S3 bucket access, Secrets Manager reads, ECR pulls, DynamoDB operations — is fully blocked. In a microservices mesh where one service assumes a role and vends sub-tokens, the failure cascades to every dependent service.
How to Fix It (The Solution)
Root Cause Checklist (run in this order)
- Check token expiry — STS temporary credentials expire between 15 minutes and 36 hours depending on
DurationSecondsat assume-role time. - Verify clock sync —
timedatectl statusorchronyc tracking. Offset must be <5 minutes. - Confirm session token is present — Temporary credentials require three components:
AccessKeyId,SecretAccessKey, ANDSessionToken. Omitting the session token is the #1 cause of this error. - Check partition/region — A token issued in
aws-cnoraws-us-govis invalid in the standardawspartition. - Check CloudTrail for revocation —
RevokeSession,DeleteRole,DetachRolePolicyevents in the last hour.
Basic Fix — Boto3 (Python)
import boto3
- # WRONG: Omitting session token for temporary credentials
- client = boto3.client(
- 'sts',
- aws_access_key_id='ASIA....',
- aws_secret_access_key='wJalrXUtn....',
- )
+ # CORRECT: All three components required for temporary credentials
+ client = boto3.client(
+ 'sts',
+ aws_access_key_id='ASIA....',
+ aws_secret_access_key='wJalrXUtn....',
+ aws_session_token='IQoJb3JpZ2lu...', # Required for STS-issued creds
+ )
response = client.get_caller_identity()
Basic Fix — AWS CLI credential refresh
- # Stale assume-role output hardcoded in ~/.aws/credentials
- [default]
- aws_access_key_id = ASIA...OLD
- aws_secret_access_key = oldSecret
- # aws_session_token line missing entirely
+ # Re-issue via CLI and export fresh credentials
+ export $(aws sts assume-role \
+ --role-arn arn:aws:iam::123456789012:role/MyRole \
+ --role-session-name debug-session \
+ --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \
+ --output text | awk '{print "AWS_ACCESS_KEY_ID="$1" AWS_SECRET_ACCESS_KEY="$2" AWS_SESSION_TOKEN="$3}')
Enterprise Best Practice — Auto-Refresh with SDK Credential Providers
Never hardcode or manually manage temporary credentials in production. Use the SDK's built-in provider chain, which handles refresh automatically.
- # ANTI-PATTERN: Manually managing short-lived credentials in application code
- creds = get_creds_from_vault() # Fetched once at startup, never refreshed
- client = boto3.client('sts',
- aws_access_key_id=creds['AccessKeyId'],
- aws_secret_access_key=creds['SecretAccessKey'],
- aws_session_token=creds['SessionToken']
- )
+ # BEST PRACTICE: Use AssumeRoleProvider via botocore RefreshableCredentials
+ from botocore.credentials import RefreshableCredentials
+ from botocore.session import get_session
+ from datetime import datetime, timezone
+
+ def refresh_credentials():
+ sts = boto3.client('sts')
+ response = sts.assume_role(
+ RoleArn='arn:aws:iam::123456789012:role/MyRole',
+ RoleSessionName='auto-refresh-session',
+ DurationSeconds=3600
+ )
+ c = response['Credentials']
+ return {
+ 'access_key': c['AccessKeyId'],
+ 'secret_key': c['SecretAccessKey'],
+ 'token': c['SessionToken'],
+ 'expiry_time': c['Expiration'].isoformat()
+ }
+
+ refreshable = RefreshableCredentials.create_from_metadata(
+ metadata=refresh_credentials(),
+ refresh_using=refresh_credentials,
+ method='sts-assume-role'
+ )
For EKS/IRSA: Ensure AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN env vars are injected by the pod identity webhook. The SDK handles the rest.
For EC2/ECS: Do not set AWS_ACCESS_KEY_ID manually. Delete the env var and let the IMDSv2 metadata provider serve auto-rotating credentials.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Enforce credential expiry checks with Checkov
# .checkov.yml — flag any hardcoded AWS credential patterns
checks:
- CKV_AWS_46 # Ensure no hardcoded credentials in Lambda env vars
- CKV_SECRET_2 # AWS Secret Access Key in source
2. OPA policy — block static credentials in ECS task definitions
package aws.ecs
deny[msg] {
container := input.containerDefinitions[_]
env := container.environment[_]
regex.match(`^AWS_(ACCESS_KEY|SECRET|SESSION_TOKEN)`, env.name)
msg := sprintf("Container '%v' hardcodes AWS credentials. Use task IAM role instead.", [container.name])
}
3. GitHub Actions — use OIDC, never store STS tokens as secrets
- # WRONG: Storing long-lived or manually-refreshed STS tokens in GitHub Secrets
- env:
- AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
- AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
- AWS_SESSION_TOKEN: ${{ secrets.AWS_SESSION_TOKEN }} # Expires, breaks pipeline
+ # CORRECT: GitHub OIDC federation — zero stored secrets, auto-rotating
+ permissions:
+ id-token: write
+ contents: read
+ steps:
+ - uses: aws-actions/configure-aws-credentials@v4
+ with:
+ role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsRole
+ aws-region: us-east-1
4. NTP enforcement via AWS Config rule
Enable the ec2-instance-profile-attached and monitor for clock drift using CloudWatch agent metrics (chrony_tracking_system_time_offset). Alert if offset exceeds 240 seconds — before AWS's 300-second hard cutoff.
# Terraform — CloudWatch alarm for NTP drift
resource "aws_cloudwatch_metric_alarm" "ntp_drift" {
alarm_name = "ntp-offset-critical"
metric_name = "chrony_tracking_system_time_offset"
namespace = "CWAgent"
statistic = "Maximum"
period = 60
evaluation_periods = 2
threshold = 240
comparison_operator = "GreaterThanThreshold"
alarm_actions = [aws_sns_topic.ops_alerts.arn]
}