How to Fix Terraform S3 Backend 'Failed to Lock State: Lock ID Mismatch' with DynamoDB
Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 10–20 mins
TL;DR
- What broke: Terraform attempted to acquire a DynamoDB state lock but found the
LockIDin the DynamoDB table doesn't match the lock it's trying to create or release — usually a stale lock from a crashedapply, a concurrent run, or a backend reconfiguration that changed the lock key path. - How to fix it: Identify the orphaned lock item in DynamoDB, verify no active Terraform process owns it, then force-unlock using
terraform force-unlock <LOCK_ID>or delete the DynamoDB item directly via AWS CLI. - Fast path: Use our Client-Side Sandbox above to auto-refactor this — paste your
backendblock and the DynamoDB lock item JSON, and get the exact unlock commands and corrected HCL generated locally in your browser.
The Incident (What Does the Error Mean?)
Raw error output from a failed terraform apply or terraform plan:
Error: Failed to lock state
Error acquiring the state lock: ConditionalCheckFailedException: The conditional request failed
Lock Info:
ID: a3f1c2d4-7e89-4b12-a1d0-5f6e7c8b9012
Path: s3://my-tf-state-bucket/prod/terraform.tfstate
Operation: OperationTypePlan
Who: ci-runner@gitlab-runner-prod
Version: 1.6.3
Created: 2024-11-14 03:22:11.408742 +0000 UTC
Info:
Terraform acquires a state lock to protect the state from being written
by multiple users at the same time. Please resolve the issue above and
try again. For most commands, you can disable locking with the
"-lock=false" flag, but this is not recommended.
Immediate consequence: Every subsequent plan, apply, or destroy is blocked. Your CI/CD pipeline is dead. If this is a production environment, no infrastructure changes can be safely deployed until the lock is cleared. The DynamoDB ConditionalCheckFailedException means Terraform's PutItem with a condition expression (attribute_not_exists(LockID)) failed — a lock record already exists for this state path.
The Attack Vector / Blast Radius
This is not just an inconvenience — understand the full failure surface:
Scenario 1 — Stale lock from a crashed CI runner (most common):
A GitLab/GitHub Actions runner was killed mid-apply (OOM kill, spot instance termination, pipeline timeout). Terraform never ran its deferred unlock. The DynamoDB item persists indefinitely. Every subsequent pipeline run hits ConditionalCheckFailedException.
Scenario 2 — Backend path drift:
Someone changed the key parameter in the S3 backend config (e.g., renamed workspace path or environment prefix). Terraform now generates a different LockID composite key (<bucket>/<key>) but the old lock item in DynamoDB still holds the previous path. The table entry and the new lock request are for different logical paths — Terraform's lock ID validation fails.
Scenario 3 — Concurrent apply from two runners: Two CI pipelines triggered simultaneously (e.g., a merge and a manual run). One acquired the lock legitimately. The second is now blocked. This is the correct behavior — but if the first runner is stuck or dead, you still end up with an orphaned lock.
Blast radius if you use -lock=false carelessly:
- Two concurrent
applyoperations write to the same.tfstatesimultaneously → state file corruption. - Corrupted state = Terraform loses track of real infrastructure → duplicate resources, failed destroys, billing blowout.
- In a worst case with remote state and no versioning on the S3 bucket, the corrupted state is unrecoverable without manual reconciliation.
DynamoDB table structure — what you're actually fighting:
The lock item stored in DynamoDB looks like this:
{
"LockID": { "S": "my-tf-state-bucket/prod/terraform.tfstate" },
"Info": {
"S": "{\"ID\":\"a3f1c2d4-7e89-4b12-a1d0-5f6e7c8b9012\",\"Operation\":\"OperationTypePlan\",\"Who\":\"ci-runner@gitlab-runner-prod\",\"Version\":\"1.6.3\",\"Created\":\"2024-11-14T03:22:11.408742Z\"}"
}
}
The LockID attribute is the partition key of the DynamoDB table. Terraform uses a conditional write (attribute_not_exists(LockID)) to acquire it atomically. A mismatch or stale entry breaks this atomic check.
How to Fix It (The Solution)
Step 1 — Confirm the lock is truly orphaned
Before you touch anything, verify no active Terraform process legitimately holds this lock:
# Inspect the current lock item directly
aws dynamodb get-item \
--table-name terraform-state-locks \
--key '{"LockID": {"S": "my-tf-state-bucket/prod/terraform.tfstate"}}' \
--region us-east-1
Check the Created timestamp in the Info field. If it's older than your longest possible apply run time and no pipeline is currently active — it's stale. Kill it.
Basic Fix — terraform force-unlock
Use the Lock ID from the error output (the UUID, not the path):
terraform force-unlock a3f1c2d4-7e89-4b12-a1d0-5f6e7c8b9012
Terraform will prompt for confirmation. This calls DynamoDB DeleteItem on the lock record. Only run this if you are 100% certain no active process owns the lock.
Nuclear Option — Direct DynamoDB DeleteItem
Use this when terraform force-unlock itself fails (e.g., backend config is broken or the workspace is inaccessible):
aws dynamodb delete-item \
--table-name terraform-state-locks \
--key '{"LockID": {"S": "my-tf-state-bucket/prod/terraform.tfstate"}}' \
--region us-east-1
Enterprise Best Practice — Fix the Root Cause (Backend Config Drift)
If the mismatch is caused by a changed key path in your backend config, the fix is to align the backend configuration. Here's the corrected diff:
terraform {
backend "s3" {
bucket = "my-tf-state-bucket"
- key = "environments/prod/terraform.tfstate"
+ key = "prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-locks"
encrypt = true
+ kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/your-key-id"
}
}
After correcting the key path, run terraform init -reconfigure to force backend re-initialization:
terraform init -reconfigure
This will NOT destroy state. It re-registers the backend pointer. Then verify the old stale lock is gone and run your plan.
Enterprise Best Practice — DynamoDB Table with TTL-Based Auto-Expiry
Prevent permanent orphaned locks by adding a TTL attribute to your DynamoDB lock table. Terraform does not natively set TTL, but you can implement a Lambda or add it via IaC:
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-state-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
+ attribute {
+ name = "ExpiresAt"
+ type = "N"
+ }
+
+ ttl {
+ attribute_name = "ExpiresAt"
+ enabled = true
+ }
+
tags = {
Name = "terraform-state-locks"
Environment = "prod"
}
}
Note: You'll need a sidecar Lambda or CI step that writes
ExpiresAt = now() + 3600when creating lock items, since Terraform's S3 backend does not write TTL natively. This is a custom wrapper pattern used in large-scale multi-team Terraform deployments.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Always wrap Terraform in a lock-aware CI pattern:
# .gitlab-ci.yml — safe Terraform apply pattern
terraform-apply:
script:
- terraform init
- terraform plan -out=tfplan
- terraform apply tfplan
after_script:
# Force-unlock on job cancellation or failure using stored lock ID
- |
LOCK_ID=$(cat .terraform/lock-id 2>/dev/null || echo "")
if [ -n "$LOCK_ID" ]; then
terraform force-unlock -force $LOCK_ID || true
fi
interruptible: false # CRITICAL: prevents mid-apply cancellation
2. Enable S3 bucket versioning — non-negotiable:
resource "aws_s3_bucket_versioning" "tf_state" {
bucket = aws_s3_bucket.tf_state.id
versioning_configuration {
- status = "Disabled"
+ status = "Enabled"
}
}
Without versioning, a corrupted state write is permanent. With versioning, you can roll back to the last known-good state.
3. Checkov policy to enforce DynamoDB locking is configured:
checkov -d . --check CKV_TF_3 # Checks: ensure S3 backend uses DynamoDB for locking
4. OPA/Sentinel policy — enforce lock table is always set:
# opa/terraform_backend_lock.rego
package terraform.backend
deny[msg] {
backend := input.configuration.backend_config
backend.type == "s3"
not backend.config.dynamodb_table
msg := "S3 backend MUST specify a DynamoDB lock table. No exceptions."
}
5. Monitor for stale locks with a CloudWatch alarm:
Set up a scheduled Lambda that scans the DynamoDB lock table for items older than 2 hours and fires a CloudWatch alarm or PagerDuty alert. A lock older than your maximum apply window is definitionally orphaned and should auto-page on-call.