Is it safe to run `terraform force-unlock` in production?

Only if you have confirmed with absolute certainty that no other Terraform process is actively holding the lock. Check your CI system, Terraform Cloud runs, and ask your team before executing. Force-unlocking while a legitimate apply is in-flight will corrupt your state file. If in doubt, wait 10 minutes and check CloudTrail for recent S3 PutObject calls to the state file path.

Why does the lock persist after my CI job fails?

Terraform acquires the DynamoDB lock at the start of an operation and releases it on clean exit. When a runner is OOM-killed, times out, or receives SIGKILL (not SIGTERM), the Terraform process never reaches its cleanup code. The lock record in DynamoDB has no automatic expiry unless you configure a TTL attribute on the table — which most default configurations omit.

Can I use `-lock=false` as a temporary workaround during an incident?

No. Using `-lock=false` removes the only concurrency guard on your state. If any other process touches the state simultaneously — even a `terraform plan` in a PR pipeline — you risk a partial state write. The correct path is always: confirm the locking process is dead, then run `terraform force-unlock `. The 30 seconds this takes is worth it versus hours of state reconstruction.

How to Fix Terraform S3 Backend State Lock Error: 'State Locked by Another Process'

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5–15 mins

TL;DR

What broke: Terraform cannot acquire the state lock because a DynamoDB lock record for your S3 backend was never released — left behind by a crashed CI runner, a killed terminal session, or a concurrent terraform apply.
How to fix it: Retrieve the LockID from the error output or DynamoDB directly, then run terraform force-unlock <LOCK_ID> after confirming no legitimate process holds it.
Fast path: Use our Client-Side Sandbox above to paste your backend config and lock error — it will auto-identify the lock ID and generate the exact force-unlock command and hardened backend block.

The Incident (What Does the Error Mean?)

Raw error output:

Error: Error locking state: Error acquiring the state lock: ConditionalCheckFailedException: The conditional request failed
	state file: s3://my-tf-state-bucket/prod/terraform.tfstate
	Lock Info:
	  ID:        a1b2c3d4-e5f6-7890-abcd-ef1234567890
	  Path:      s3://my-tf-state-bucket/prod/terraform.tfstate
	  Operation: OperationTypeApply
	  Who:       runner@ci-node-07
	  Version:   1.5.7
	  Created:   2024-01-15 03:42:11.823456789 +0000 UTC
	  Info:

Terraform acquires a state lock to protect the state from being written
by multiple users at the same time. Please resolve the issue above and try
again. For most commands, you can disable locking with the "-lock=false"
flag, but this is not recommended.

Immediate consequence: Every terraform plan, apply, and destroy is dead until the lock is cleared. Your pipeline is halted. If a team member bypasses this with -lock=false, you risk concurrent state writes and state corruption — which is significantly worse than the lock itself.

The Attack Vector / Blast Radius

This is not just an inconvenience. The blast radius has two failure modes:

1. Operational: Stale lock from a dead process CI runners get OOM-killed. Engineers hit Ctrl+C mid-apply. GitHub Actions jobs time out. In every case, Terraform never gets to release the lock. The DynamoDB LockTable retains the LockID record indefinitely. Your entire team is now blocked on a ghost process.

2. Security: The -lock=false temptation Under pressure during an outage, engineers reach for terraform apply -lock=false. This disables the only concurrency control protecting your state file. If two engineers or two CI jobs run simultaneously with locking disabled:

State writes race and one apply silently overwrites the other
Resources get orphaned — created in AWS but absent from state
Next terraform apply sees drift and destroys live infrastructure to reconcile

If your S3 bucket is misconfigured (no versioning, no MFA delete): a corrupted state write is unrecoverable. You are now manually reconstructing state for a production environment.

How to Fix It

Step 1: Verify No Legitimate Process Holds the Lock

Before force-unlocking, confirm the PID/runner in the lock metadata is actually dead:

# Check the raw lock record in DynamoDB
aws dynamodb get-item \
  --table-name terraform-state-locks \
  --key '{"LockID": {"S": "my-tf-state-bucket/prod/terraform.tfstate"}}'

Inspect the Who and Created fields. If the timestamp is >15 minutes old and the CI job is confirmed dead, proceed.

Basic Fix: Force Unlock

# Use the LockID from the error output exactly
terraform force-unlock a1b2c3d4-e5f6-7890-abcd-ef1234567890

Terraform will prompt for confirmation. This deletes the DynamoDB lock record. Your next terraform plan will proceed normally.

Manual DynamoDB fallback (if Terraform CLI is unavailable):

aws dynamodb delete-item \
  --table-name terraform-state-locks \
  --key '{"LockID": {"S": "my-tf-state-bucket/prod/terraform.tfstate"}}'

Enterprise Best Practice: Hardened Backend Config

The root cause is often an under-configured backend. Here is the diff between a typical broken config and a production-hardened one:

terraform {
  backend "s3" {
    bucket         = "my-tf-state-bucket"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
-   # No encryption
-   # No versioning enforced at backend level
-   # No KMS key
-   dynamodb_table = "terraform-locks"
+   encrypt        = true
+   kms_key_id     = "arn:aws:kms:us-east-1:123456789012:key/your-cmk-key-id"
+   dynamodb_table = "terraform-state-locks"
+   # Enforce versioning and access logging via separate aws_s3_bucket resources
  }
}

# DynamoDB table for locking
resource "aws_dynamodb_table" "tf_locks" {
  name         = "terraform-state-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }

+ point_in_time_recovery {
+   enabled = true
+ }
+
+ server_side_encryption {
+   enabled = true
+ }
+
+ ttl {
+   attribute_name = "TimeToExist"
+   enabled        = true
+ }
}

The TTL attribute is the key enterprise addition. Set a TTL on lock records (e.g., 4 hours) as a dead-man's switch. Stale locks auto-expire even if your cleanup job fails.

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Always wrap Terraform in a lock-aware CI wrapper

#!/bin/bash
# ci-terraform-apply.sh
set -euo pipefail

cleanup() {
  echo "Runner exiting — releasing lock if held"
  terraform force-unlock -force "${LOCK_ID:-}" 2>/dev/null || true
}
trap cleanup EXIT SIGTERM SIGINT

terraform apply -auto-approve

2. Checkov: Enforce S3 versioning and encryption on state buckets

# .checkov.yaml
checks:
  - CKV_AWS_77   # S3 versioning enabled
  - CKV_AWS_119  # DynamoDB encrypted at rest
  - CKV_AWS_28   # DynamoDB point-in-time recovery

Run in your pipeline:

checkov -d . --check CKV_AWS_77,CKV_AWS_119,CKV_AWS_28 --hard-fail-on HIGH

3. OPA Policy: Block -lock=false in Atlantis/Terraform Cloud

# policy/no_lock_disable.rego
package terraform.deny_lock_false

deny[msg] {
  input.flags[_] == "-lock=false"
  msg := "POLICY VIOLATION: -lock=false is prohibited in automated pipelines. Resolve the stale lock manually."
}

4. CloudWatch Alarm on stale locks

# Alert if any DynamoDB lock record is older than 30 minutes
aws cloudwatch put-metric-alarm \
  --alarm-name "TerraformStaleLock" \
  --metric-name "OldestUnprocessedAge" \
  --namespace "TerraformLocks" \
  --threshold 1800 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:ops-alerts

Pair this with a Lambda that queries DynamoDB for lock records with Created timestamps older than your longest expected apply window and pages on-call automatically.