Initializing Enclave...

How to Fix Terraform Plan Showing Destroy After State Drift (Without Nuking Production)

Threat/Impact Level: CRITICAL | Downtime Risk: HIGH | Time to Fix: 15–45 mins depending on resource type

TL;DR

  • What broke: Terraform's state file diverged from real infrastructure (manual change, out-of-band automation, or a botched state mv), so terraform plan now shows a -/+ destroy-and-recreate or a flat destroy for a live resource.
  • How to fix it: Reconcile state via terraform import, terraform state mv, or a targeted terraform refresh — then patch the config to match reality before running apply.
  • Fast path: Use our Client-Side Sandbox below to auto-refactor this — paste your plan output and .tf block and get the exact import command and corrected config without sending your ARNs or credentials anywhere.

The Incident (What Does the Error Mean?)

You run terraform plan and see this:

# aws_db_instance.primary will be destroyed
# (because aws_db_instance.primary is not in configuration)
  - resource "aws_db_instance" "primary" {
      - identifier        = "prod-postgres-01"
      - instance_class    = "db.r6g.2xlarge"
      - allocated_storage = 500
      ...
    }

Plan: 0 to add, 0 to change, 1 to destroy.

Or the subtler destroy-recreate variant:

# aws_security_group.app_sg must be replaced
-/+ resource "aws_security_group" "app_sg" {
      ~ id   = "sg-0abc123" -> (known after apply)  # forces replacement
      ~ name = "app-sg-prod" -> "app-sg-prod-v2"
    }

Immediate consequence: If terraform apply runs — in a pipeline, by a junior engineer, or via auto-apply in Terraform Cloud — that resource is gone or recreated. For stateful resources (RDS, ElastiCache, EBS volumes, security groups with live ingress rules), this is a production outage or a data-loss event.


The Attack Vector / Blast Radius

State drift is deceptively dangerous because the destroy is legitimate from Terraform's perspective — it is doing exactly what the math says. The blast radius depends on resource type:

Resource Destroy Consequence
aws_db_instance Data loss if final snapshot not forced; RTO in hours
aws_security_group All dependent ENIs lose their SG; instant network blackout
aws_iam_role Attached workloads lose permissions; cascading auth failures
aws_s3_bucket Bucket deletion may purge objects if force_destroy = true
aws_eks_node_group Node pool drained; workloads evicted

Root causes of drift — know which one you're dealing with:

  1. Manual console/CLI change — someone added a tag, resized an instance, or renamed a resource outside Terraform.
  2. State file desync — two engineers ran terraform apply against different state backends, or a remote state lock failed silently.
  3. Resource rename/refactor in .tf — a moved {} block was missing after renaming a resource address.
  4. Workspace or backend misconfiguration — plan is running against the wrong workspace, pointing at a stale state file.
  5. Provider upgrade — new provider version reads an attribute differently, causing a computed diff that forces replacement.

How to Fix It (The Solution)

Step 0: Halt the Pipeline Immediately

If this is in CI/CD, kill the apply job now. Set a manual approval gate. Do not let auto-apply proceed.


Basic Fix — Re-import the Drifted Resource

If the resource exists in AWS but Terraform wants to destroy it because it fell out of state:

# Find the real resource ID from AWS CLI
aws rds describe-db-instances --query 'DBInstances[*].DBInstanceIdentifier'

# Re-import it into the correct state address
terraform import aws_db_instance.primary prod-postgres-01

# Re-run plan — verify zero destroys before apply
terraform plan -out=tfplan

If the resource was renamed in .tf without a moved block:

terraform state mv aws_db_instance.old_name aws_db_instance.primary

Enterprise Best Practice — Prevent Drift with moved Blocks + Lifecycle Guards

 resource "aws_db_instance" "primary" {
   identifier        = "prod-postgres-01"
   instance_class    = "db.r6g.2xlarge"
   allocated_storage = 500
   engine            = "postgres"
+
+  lifecycle {
+    prevent_destroy = true
+    ignore_changes  = [engine_version, maintenance_window]
+  }
 }

+# If you renamed from aws_db_instance.rds_main — always add this
+moved {
+  from = aws_db_instance.rds_main
+  to   = aws_db_instance.primary
+}

For security group replacement drift (name change forcing destroy):

 resource "aws_security_group" "app_sg" {
-  name        = "app-sg-prod-v2"
+  name        = "app-sg-prod"        # match the real existing name exactly
   description = "Application tier SG"
   vpc_id      = var.vpc_id
+
+  lifecycle {
+    create_before_destroy = true
+    prevent_destroy       = true
+  }
 }

For workspace/backend drift — always explicitly target:

# Confirm you are in the correct workspace before ANY plan
terraform workspace show
terraform workspace select prod

# Refresh state against real infra before planning
terraform apply -refresh-only

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Enforce prevent_destroy on Stateful Resources via OPA/Sentinel

# OPA policy: deny any plan containing destroys on protected resource types
package terraform.destroy_guard

protected_types := {"aws_db_instance", "aws_s3_bucket", "aws_elasticache_cluster", "aws_eks_cluster"}

deny[msg] {
  rc := input.resource_changes[_]
  rc.change.actions[_] == "delete"
  protected_types[rc.type]
  msg := sprintf("BLOCKED: destroy of protected resource %s (%s)", [rc.address, rc.type])
}

2. Run terraform plan -detailed-exitcode in CI and Gate on Destroys

# GitHub Actions example
- name: Terraform Plan
  run: |
    terraform plan -out=tfplan -detailed-exitcode
    terraform show -json tfplan | jq '[
      .resource_changes[] | select(.change.actions[] == "delete")
    ] | length' > destroy_count.txt
    DESTROYS=$(cat destroy_count.txt)
    if [ "$DESTROYS" -gt "0" ]; then
      echo "::error::Plan contains $DESTROYS destroy(s). Manual approval required."
      exit 1
    fi

3. Drift Detection on a Schedule

# Run refresh-only plan daily; alert if drift detected
terraform plan -refresh-only -detailed-exitcode
# Exit code 2 = diff detected. Wire this to PagerDuty or Slack.

4. Checkov Rule for Missing prevent_destroy

checkov -d . --check CKV_TF_1  # checks for lifecycle prevent_destroy on high-risk resources

5. Lock State Aggressively

  • Use DynamoDB state locking for S3 backends — never skip it.
  • Enable Terraform Cloud run triggers with lock timeouts.
  • Never manually edit .tfstate — use terraform state subcommands exclusively.

Related Diagnostics

"Part of the Security Utility Matrix."

View all 140 Security Tools →