Fixing Terraform ResourceInUseException on DynamoDB Table Updates: Retry Logic and Root Cause
Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 10 mins
TL;DR
- What broke: Terraform issued a DynamoDB
UpdateTableAPI call while AWS was already processing a concurrent operation (index backfill, autoscaling policy update, or a prior GSI modification) on the same table — AWS rejected it withResourceInUseException. - How to fix it: Add explicit
depends_onchains between DynamoDB resources, uselifecycle { ignore_changes }for attributes managed out-of-band, and wrap destructive changes in a retry loop vianull_resource+ AWS CLI. - Shortcut: Use our Client-Side Sandbox below to auto-refactor this — paste your failing
aws_dynamodb_tableblock and get corrected HCL instantly.
The Incident (What Does the Error Mean?)
Raw error from terraform apply:
Error: updating DynamoDB Table (my-production-table): ResourceInUseException:
Table is being updated: my-production-table
status code: 400, request id: 4a1b2c3d-xxxx-xxxx-xxxx-ffffffffffff
AWS DynamoDB is a single-writer-per-table control plane. Only one UpdateTable operation can be in-flight at any moment. When Terraform attempts a second mutation — adding a GSI, modifying billing mode, enabling streams — while the table status is anything other than ACTIVE, AWS hard-rejects it. Terraform does not retry this by default. Your pipeline dies, the state is partially written, and your next apply may attempt to re-diff a table that is still mid-transition.
The Attack Vector / Blast Radius
This is a cascading deployment failure pattern, not a one-time blip:
- State drift: Terraform marks the resource as tainted or leaves it in an unknown diff state. Subsequent
planoutputs become unreliable. - GSI backfill deadlock: If the blocked operation was a GSI addition, the index never gets created. Dependent Lambda functions or application queries against that GSI begin returning
ResourceNotFoundExceptionat runtime — silent data access failure. - Autoscaling race: DynamoDB Application Autoscaling registers scaling policies asynchronously. If Terraform manages both the table and its autoscaling targets in the same
apply, the autoscaling API can trigger anUpdateTablemilliseconds before Terraform's own call — a race condition that is 100% reproducible in large workspaces usingterraform apply -parallelism=10. - CI/CD pipeline lock: Most pipelines have no retry logic on provider errors. A single
ResourceInUseExceptionblocks the entire release train until a human intervenes.
How to Fix It (The Solution)
Basic Fix — Serialize Operations with depends_on and Reduce Parallelism
If you are managing multiple DynamoDB resources or autoscaling attachments in the same workspace, force serialization:
resource "aws_dynamodb_table" "main" {
name = "my-production-table"
billing_mode = "PAY_PER_REQUEST"
hash_key = "pk"
+ lifecycle {
+ ignore_changes = [
+ # Ignore read_capacity/write_capacity mutated by autoscaling out-of-band
+ read_capacity,
+ write_capacity,
+ ]
+ }
}
resource "aws_appautoscaling_target" "dynamodb_table_read" {
...
+ depends_on = [aws_dynamodb_table.main]
}
Run apply with reduced parallelism to prevent race conditions:
- terraform apply
+ terraform apply -parallelism=1
Enterprise Best Practice — Retry Wrapper via null_resource
For pipelines that cannot tolerate manual intervention, wrap the sensitive update in a polling retry using AWS CLI inside a null_resource. This is especially critical for GSI additions on large tables where backfill can take 20–40 minutes.
-resource "aws_dynamodb_table" "main" {
- name = "my-production-table"
- billing_mode = "PROVISIONED"
- read_capacity = 5
- write_capacity = 5
-
- global_secondary_index {
- name = "gsi-email"
- hash_key = "email"
- projection_type = "ALL"
- read_capacity = 5
- write_capacity = 5
- }
-}
+resource "aws_dynamodb_table" "main" {
+ name = "my-production-table"
+ billing_mode = "PROVISIONED"
+ read_capacity = 5
+ write_capacity = 5
+
+ # Do NOT declare GSIs here if adding them post-creation.
+ # Manage GSI additions via null_resource below to control retry logic.
+
+ lifecycle {
+ ignore_changes = [global_secondary_index, read_capacity, write_capacity]
+ }
+}
+
+resource "null_resource" "add_gsi_email" {
+ depends_on = [aws_dynamodb_table.main]
+
+ triggers = {
+ gsi_definition = "gsi-email-v1"
+ }
+
+ provisioner "local-exec" {
+ command = <<-EOT
+ set -e
+ MAX_RETRIES=15
+ RETRY_INTERVAL=60
+ for i in $(seq 1 $MAX_RETRIES); do
+ STATUS=$(aws dynamodb describe-table \
+ --table-name my-production-table \
+ --query 'Table.TableStatus' \
+ --output text)
+ if [ "$STATUS" = "ACTIVE" ]; then
+ echo "Table ACTIVE. Applying GSI update..."
+ aws dynamodb update-table \
+ --table-name my-production-table \
+ --attribute-definitions AttributeName=email,AttributeType=S \
+ --global-secondary-index-updates \
+ '[{"Create":{"IndexName":"gsi-email","KeySchema":[{"AttributeName":"email","KeyType":"HASH"}],"Projection":{"ProjectionType":"ALL"},"ProvisionedThroughput":{"ReadCapacityUnits":5,"WriteCapacityUnits":5}}}]'
+ break
+ fi
+ echo "Table status: $STATUS. Waiting $RETRY_INTERVAL seconds... ($i/$MAX_RETRIES)"
+ sleep $RETRY_INTERVAL
+ done
+ EOT
+ }
+}
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Checkov policy — block concurrent GSI declarations:
Add a Checkov custom check or use terraform validate gating in your pipeline to flag any plan that modifies more than one DynamoDB attribute in a single run when TableStatus != ACTIVE.
2. OPA / Sentinel policy — enforce -parallelism=1 for DynamoDB workspaces:
# policy/dynamodb_serial.rego
deny[msg] {
input.resource_changes[_].type == "aws_dynamodb_table"
input.terraform_version # confirm plan file context
not input.variables.parallelism == "1"
msg := "DynamoDB table changes MUST be applied with -parallelism=1 to prevent ResourceInUseException."
}
3. AWS EventBridge alerting on UpdateTable API calls:
Set up a CloudTrail → EventBridge rule that fires an SNS alert whenever UpdateTable is called on production tables. This gives your team visibility into out-of-band mutations (autoscaling, console changes) that will cause the next Terraform apply to fail.
4. Terraform precondition block (Terraform ≥ 1.2):
resource "aws_dynamodb_table" "main" {
...
lifecycle {
precondition {
condition = can(regex("^ACTIVE$", data.aws_dynamodb_table.main_check.table_status))
error_message = "DynamoDB table is not in ACTIVE state. Aborting to prevent ResourceInUseException."
}
}
}
Pair this with a data "aws_dynamodb_table" source block that reads current state before any mutation is attempted.