How to Fix Terraform S3 Backend State Lock Error: 'State Locked by Another Process'
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5–15 mins
TL;DR
- What broke: Terraform cannot acquire the state lock because a DynamoDB lock record for your S3 backend was never released — left behind by a crashed CI runner, a killed terminal session, or a concurrent
terraform apply. - How to fix it: Retrieve the
LockIDfrom the error output or DynamoDB directly, then runterraform force-unlock <LOCK_ID>after confirming no legitimate process holds it. - Fast path: Use our Client-Side Sandbox above to paste your backend config and lock error — it will auto-identify the lock ID and generate the exact force-unlock command and hardened backend block.
The Incident (What Does the Error Mean?)
Raw error output:
Error: Error locking state: Error acquiring the state lock: ConditionalCheckFailedException: The conditional request failed
state file: s3://my-tf-state-bucket/prod/terraform.tfstate
Lock Info:
ID: a1b2c3d4-e5f6-7890-abcd-ef1234567890
Path: s3://my-tf-state-bucket/prod/terraform.tfstate
Operation: OperationTypeApply
Who: runner@ci-node-07
Version: 1.5.7
Created: 2024-01-15 03:42:11.823456789 +0000 UTC
Info:
Terraform acquires a state lock to protect the state from being written
by multiple users at the same time. Please resolve the issue above and try
again. For most commands, you can disable locking with the "-lock=false"
flag, but this is not recommended.
Immediate consequence: Every terraform plan, apply, and destroy is dead until the lock is cleared. Your pipeline is halted. If a team member bypasses this with -lock=false, you risk concurrent state writes and state corruption — which is significantly worse than the lock itself.
The Attack Vector / Blast Radius
This is not just an inconvenience. The blast radius has two failure modes:
1. Operational: Stale lock from a dead process
CI runners get OOM-killed. Engineers hit Ctrl+C mid-apply. GitHub Actions jobs time out. In every case, Terraform never gets to release the lock. The DynamoDB LockTable retains the LockID record indefinitely. Your entire team is now blocked on a ghost process.
2. Security: The -lock=false temptation
Under pressure during an outage, engineers reach for terraform apply -lock=false. This disables the only concurrency control protecting your state file. If two engineers or two CI jobs run simultaneously with locking disabled:
- State writes race and one apply silently overwrites the other
- Resources get orphaned — created in AWS but absent from state
- Next
terraform applysees drift and destroys live infrastructure to reconcile
If your S3 bucket is misconfigured (no versioning, no MFA delete): a corrupted state write is unrecoverable. You are now manually reconstructing state for a production environment.
How to Fix It
Step 1: Verify No Legitimate Process Holds the Lock
Before force-unlocking, confirm the PID/runner in the lock metadata is actually dead:
# Check the raw lock record in DynamoDB
aws dynamodb get-item \
--table-name terraform-state-locks \
--key '{"LockID": {"S": "my-tf-state-bucket/prod/terraform.tfstate"}}'
Inspect the Who and Created fields. If the timestamp is >15 minutes old and the CI job is confirmed dead, proceed.
Basic Fix: Force Unlock
# Use the LockID from the error output exactly
terraform force-unlock a1b2c3d4-e5f6-7890-abcd-ef1234567890
Terraform will prompt for confirmation. This deletes the DynamoDB lock record. Your next terraform plan will proceed normally.
Manual DynamoDB fallback (if Terraform CLI is unavailable):
aws dynamodb delete-item \
--table-name terraform-state-locks \
--key '{"LockID": {"S": "my-tf-state-bucket/prod/terraform.tfstate"}}'
Enterprise Best Practice: Hardened Backend Config
The root cause is often an under-configured backend. Here is the diff between a typical broken config and a production-hardened one:
terraform {
backend "s3" {
bucket = "my-tf-state-bucket"
key = "prod/terraform.tfstate"
region = "us-east-1"
- # No encryption
- # No versioning enforced at backend level
- # No KMS key
- dynamodb_table = "terraform-locks"
+ encrypt = true
+ kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/your-cmk-key-id"
+ dynamodb_table = "terraform-state-locks"
+ # Enforce versioning and access logging via separate aws_s3_bucket resources
}
}
# DynamoDB table for locking
resource "aws_dynamodb_table" "tf_locks" {
name = "terraform-state-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
+ point_in_time_recovery {
+ enabled = true
+ }
+
+ server_side_encryption {
+ enabled = true
+ }
+
+ ttl {
+ attribute_name = "TimeToExist"
+ enabled = true
+ }
}
The TTL attribute is the key enterprise addition. Set a TTL on lock records (e.g., 4 hours) as a dead-man's switch. Stale locks auto-expire even if your cleanup job fails.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Always wrap Terraform in a lock-aware CI wrapper
#!/bin/bash
# ci-terraform-apply.sh
set -euo pipefail
cleanup() {
echo "Runner exiting — releasing lock if held"
terraform force-unlock -force "${LOCK_ID:-}" 2>/dev/null || true
}
trap cleanup EXIT SIGTERM SIGINT
terraform apply -auto-approve
2. Checkov: Enforce S3 versioning and encryption on state buckets
# .checkov.yaml
checks:
- CKV_AWS_77 # S3 versioning enabled
- CKV_AWS_119 # DynamoDB encrypted at rest
- CKV_AWS_28 # DynamoDB point-in-time recovery
Run in your pipeline:
checkov -d . --check CKV_AWS_77,CKV_AWS_119,CKV_AWS_28 --hard-fail-on HIGH
3. OPA Policy: Block -lock=false in Atlantis/Terraform Cloud
# policy/no_lock_disable.rego
package terraform.deny_lock_false
deny[msg] {
input.flags[_] == "-lock=false"
msg := "POLICY VIOLATION: -lock=false is prohibited in automated pipelines. Resolve the stale lock manually."
}
4. CloudWatch Alarm on stale locks
# Alert if any DynamoDB lock record is older than 30 minutes
aws cloudwatch put-metric-alarm \
--alarm-name "TerraformStaleLock" \
--metric-name "OldestUnprocessedAge" \
--namespace "TerraformLocks" \
--threshold 1800 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:ops-alerts
Pair this with a Lambda that queries DynamoDB for lock records with Created timestamps older than your longest expected apply window and pages on-call automatically.