Why does Terraform not automatically retry ResourceInUseException on DynamoDB?

The AWS Terraform provider retries certain throttling errors (ThrottlingException, ProvisionedThroughputExceededException) but treats ResourceInUseException as a hard client error because it indicates a logical conflict, not a transient rate limit. The provider cannot know how long the conflicting operation will take — GSI backfills on large tables can run for hours — so it fails fast and defers retry logic to the operator.

Can I use terraform apply -refresh-only to recover from a partial apply caused by this error?

Yes, but carefully. Run terraform refresh first to sync state with actual AWS resource status, then run terraform plan to inspect the diff. If the table is still in UPDATING status, do not apply. Wait for the table to return to ACTIVE (poll with: aws dynamodb describe-table --table-name --query 'Table.TableStatus') before re-running apply.

Does switching DynamoDB billing mode from PROVISIONED to PAY_PER_REQUEST also trigger ResourceInUseException?

Yes. Billing mode changes are implemented internally as an UpdateTable call and transition the table through an intermediate non-ACTIVE state. If you have autoscaling targets, stream configurations, or GSI modifications in the same Terraform plan, they will all race against this billing mode transition. Always isolate billing mode changes into a separate targeted apply: terraform apply -target=aws_dynamodb_table.main before applying dependent resources.

Fixing Terraform ResourceInUseException on DynamoDB Table Updates: Retry Logic and Root Cause

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 10 mins

TL;DR

What broke: Terraform issued a DynamoDB UpdateTable API call while AWS was already processing a concurrent operation (index backfill, autoscaling policy update, or a prior GSI modification) on the same table — AWS rejected it with ResourceInUseException.
How to fix it: Add explicit depends_on chains between DynamoDB resources, use lifecycle { ignore_changes } for attributes managed out-of-band, and wrap destructive changes in a retry loop via null_resource + AWS CLI.
Shortcut: Use our Client-Side Sandbox below to auto-refactor this — paste your failing aws_dynamodb_table block and get corrected HCL instantly.

The Incident (What Does the Error Mean?)

Raw error from terraform apply:

Error: updating DynamoDB Table (my-production-table): ResourceInUseException:
  Table is being updated: my-production-table
    status code: 400, request id: 4a1b2c3d-xxxx-xxxx-xxxx-ffffffffffff

AWS DynamoDB is a single-writer-per-table control plane. Only one UpdateTable operation can be in-flight at any moment. When Terraform attempts a second mutation — adding a GSI, modifying billing mode, enabling streams — while the table status is anything other than ACTIVE, AWS hard-rejects it. Terraform does not retry this by default. Your pipeline dies, the state is partially written, and your next apply may attempt to re-diff a table that is still mid-transition.

The Attack Vector / Blast Radius

This is a cascading deployment failure pattern, not a one-time blip:

State drift: Terraform marks the resource as tainted or leaves it in an unknown diff state. Subsequent plan outputs become unreliable.
GSI backfill deadlock: If the blocked operation was a GSI addition, the index never gets created. Dependent Lambda functions or application queries against that GSI begin returning ResourceNotFoundException at runtime — silent data access failure.
Autoscaling race: DynamoDB Application Autoscaling registers scaling policies asynchronously. If Terraform manages both the table and its autoscaling targets in the same apply, the autoscaling API can trigger an UpdateTable milliseconds before Terraform's own call — a race condition that is 100% reproducible in large workspaces using terraform apply -parallelism=10.
CI/CD pipeline lock: Most pipelines have no retry logic on provider errors. A single ResourceInUseException blocks the entire release train until a human intervenes.

How to Fix It (The Solution)

Basic Fix — Serialize Operations with `depends_on` and Reduce Parallelism

If you are managing multiple DynamoDB resources or autoscaling attachments in the same workspace, force serialization:

 resource "aws_dynamodb_table" "main" {
   name           = "my-production-table"
   billing_mode   = "PAY_PER_REQUEST"
   hash_key       = "pk"

+  lifecycle {
+    ignore_changes = [
+      # Ignore read_capacity/write_capacity mutated by autoscaling out-of-band
+      read_capacity,
+      write_capacity,
+    ]
+  }
 }

 resource "aws_appautoscaling_target" "dynamodb_table_read" {
   ...
+  depends_on = [aws_dynamodb_table.main]
 }

Run apply with reduced parallelism to prevent race conditions:

- terraform apply
+ terraform apply -parallelism=1

Enterprise Best Practice — Retry Wrapper via `null_resource`

For pipelines that cannot tolerate manual intervention, wrap the sensitive update in a polling retry using AWS CLI inside a null_resource. This is especially critical for GSI additions on large tables where backfill can take 20–40 minutes.

-resource "aws_dynamodb_table" "main" {
-  name         = "my-production-table"
-  billing_mode = "PROVISIONED"
-  read_capacity  = 5
-  write_capacity = 5
-
-  global_secondary_index {
-    name            = "gsi-email"
-    hash_key        = "email"
-    projection_type = "ALL"
-    read_capacity   = 5
-    write_capacity  = 5
-  }
-}

+resource "aws_dynamodb_table" "main" {
+  name         = "my-production-table"
+  billing_mode = "PROVISIONED"
+  read_capacity  = 5
+  write_capacity = 5
+
+  # Do NOT declare GSIs here if adding them post-creation.
+  # Manage GSI additions via null_resource below to control retry logic.
+
+  lifecycle {
+    ignore_changes = [global_secondary_index, read_capacity, write_capacity]
+  }
+}
+
+resource "null_resource" "add_gsi_email" {
+  depends_on = [aws_dynamodb_table.main]
+
+  triggers = {
+    gsi_definition = "gsi-email-v1"
+  }
+
+  provisioner "local-exec" {
+    command = <<-EOT
+      set -e
+      MAX_RETRIES=15
+      RETRY_INTERVAL=60
+      for i in $(seq 1 $MAX_RETRIES); do
+        STATUS=$(aws dynamodb describe-table \
+          --table-name my-production-table \
+          --query 'Table.TableStatus' \
+          --output text)
+        if [ "$STATUS" = "ACTIVE" ]; then
+          echo "Table ACTIVE. Applying GSI update..."
+          aws dynamodb update-table \
+            --table-name my-production-table \
+            --attribute-definitions AttributeName=email,AttributeType=S \
+            --global-secondary-index-updates \
+              '[{"Create":{"IndexName":"gsi-email","KeySchema":[{"AttributeName":"email","KeyType":"HASH"}],"Projection":{"ProjectionType":"ALL"},"ProvisionedThroughput":{"ReadCapacityUnits":5,"WriteCapacityUnits":5}}}]'
+          break
+        fi
+        echo "Table status: $STATUS. Waiting $RETRY_INTERVAL seconds... ($i/$MAX_RETRIES)"
+        sleep $RETRY_INTERVAL
+      done
+    EOT
+  }
+}

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Checkov policy — block concurrent GSI declarations:

Add a Checkov custom check or use terraform validate gating in your pipeline to flag any plan that modifies more than one DynamoDB attribute in a single run when TableStatus != ACTIVE.

2. OPA / Sentinel policy — enforce -parallelism=1 for DynamoDB workspaces:

# policy/dynamodb_serial.rego
deny[msg] {
  input.resource_changes[_].type == "aws_dynamodb_table"
  input.terraform_version  # confirm plan file context
  not input.variables.parallelism == "1"
  msg := "DynamoDB table changes MUST be applied with -parallelism=1 to prevent ResourceInUseException."
}

3. AWS EventBridge alerting on UpdateTable API calls:

Set up a CloudTrail → EventBridge rule that fires an SNS alert whenever UpdateTable is called on production tables. This gives your team visibility into out-of-band mutations (autoscaling, console changes) that will cause the next Terraform apply to fail.

4. Terraform precondition block (Terraform ≥ 1.2):

resource "aws_dynamodb_table" "main" {
  ...
  lifecycle {
    precondition {
      condition     = can(regex("^ACTIVE$", data.aws_dynamodb_table.main_check.table_status))
      error_message = "DynamoDB table is not in ACTIVE state. Aborting to prevent ResourceInUseException."
    }
  }
}

Pair this with a data "aws_dynamodb_table" source block that reads current state before any mutation is attempted.