How to Fix Terraform Plan Showing Destroy After State Drift (Without Nuking Production)
Threat/Impact Level: CRITICAL | Downtime Risk: HIGH | Time to Fix: 15–45 mins depending on resource type
TL;DR
- What broke: Terraform's state file diverged from real infrastructure (manual change, out-of-band automation, or a botched
state mv), soterraform plannow shows a-/+destroy-and-recreate or a flatdestroyfor a live resource. - How to fix it: Reconcile state via
terraform import,terraform state mv, or a targetedterraform refresh— then patch the config to match reality before runningapply. - Fast path: Use our Client-Side Sandbox below to auto-refactor this — paste your plan output and
.tfblock and get the exact import command and corrected config without sending your ARNs or credentials anywhere.
The Incident (What Does the Error Mean?)
You run terraform plan and see this:
# aws_db_instance.primary will be destroyed
# (because aws_db_instance.primary is not in configuration)
- resource "aws_db_instance" "primary" {
- identifier = "prod-postgres-01"
- instance_class = "db.r6g.2xlarge"
- allocated_storage = 500
...
}
Plan: 0 to add, 0 to change, 1 to destroy.
Or the subtler destroy-recreate variant:
# aws_security_group.app_sg must be replaced
-/+ resource "aws_security_group" "app_sg" {
~ id = "sg-0abc123" -> (known after apply) # forces replacement
~ name = "app-sg-prod" -> "app-sg-prod-v2"
}
Immediate consequence: If terraform apply runs — in a pipeline, by a junior engineer, or via auto-apply in Terraform Cloud — that resource is gone or recreated. For stateful resources (RDS, ElastiCache, EBS volumes, security groups with live ingress rules), this is a production outage or a data-loss event.
The Attack Vector / Blast Radius
State drift is deceptively dangerous because the destroy is legitimate from Terraform's perspective — it is doing exactly what the math says. The blast radius depends on resource type:
| Resource | Destroy Consequence |
|---|---|
aws_db_instance |
Data loss if final snapshot not forced; RTO in hours |
aws_security_group |
All dependent ENIs lose their SG; instant network blackout |
aws_iam_role |
Attached workloads lose permissions; cascading auth failures |
aws_s3_bucket |
Bucket deletion may purge objects if force_destroy = true |
aws_eks_node_group |
Node pool drained; workloads evicted |
Root causes of drift — know which one you're dealing with:
- Manual console/CLI change — someone added a tag, resized an instance, or renamed a resource outside Terraform.
- State file desync — two engineers ran
terraform applyagainst different state backends, or a remote state lock failed silently. - Resource rename/refactor in
.tf— amoved {}block was missing after renaming a resource address. - Workspace or backend misconfiguration — plan is running against the wrong workspace, pointing at a stale state file.
- Provider upgrade — new provider version reads an attribute differently, causing a computed diff that forces replacement.
How to Fix It (The Solution)
Step 0: Halt the Pipeline Immediately
If this is in CI/CD, kill the apply job now. Set a manual approval gate. Do not let auto-apply proceed.
Basic Fix — Re-import the Drifted Resource
If the resource exists in AWS but Terraform wants to destroy it because it fell out of state:
# Find the real resource ID from AWS CLI
aws rds describe-db-instances --query 'DBInstances[*].DBInstanceIdentifier'
# Re-import it into the correct state address
terraform import aws_db_instance.primary prod-postgres-01
# Re-run plan — verify zero destroys before apply
terraform plan -out=tfplan
If the resource was renamed in .tf without a moved block:
terraform state mv aws_db_instance.old_name aws_db_instance.primary
Enterprise Best Practice — Prevent Drift with moved Blocks + Lifecycle Guards
resource "aws_db_instance" "primary" {
identifier = "prod-postgres-01"
instance_class = "db.r6g.2xlarge"
allocated_storage = 500
engine = "postgres"
+
+ lifecycle {
+ prevent_destroy = true
+ ignore_changes = [engine_version, maintenance_window]
+ }
}
+# If you renamed from aws_db_instance.rds_main — always add this
+moved {
+ from = aws_db_instance.rds_main
+ to = aws_db_instance.primary
+}
For security group replacement drift (name change forcing destroy):
resource "aws_security_group" "app_sg" {
- name = "app-sg-prod-v2"
+ name = "app-sg-prod" # match the real existing name exactly
description = "Application tier SG"
vpc_id = var.vpc_id
+
+ lifecycle {
+ create_before_destroy = true
+ prevent_destroy = true
+ }
}
For workspace/backend drift — always explicitly target:
# Confirm you are in the correct workspace before ANY plan
terraform workspace show
terraform workspace select prod
# Refresh state against real infra before planning
terraform apply -refresh-only
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Enforce prevent_destroy on Stateful Resources via OPA/Sentinel
# OPA policy: deny any plan containing destroys on protected resource types
package terraform.destroy_guard
protected_types := {"aws_db_instance", "aws_s3_bucket", "aws_elasticache_cluster", "aws_eks_cluster"}
deny[msg] {
rc := input.resource_changes[_]
rc.change.actions[_] == "delete"
protected_types[rc.type]
msg := sprintf("BLOCKED: destroy of protected resource %s (%s)", [rc.address, rc.type])
}
2. Run terraform plan -detailed-exitcode in CI and Gate on Destroys
# GitHub Actions example
- name: Terraform Plan
run: |
terraform plan -out=tfplan -detailed-exitcode
terraform show -json tfplan | jq '[
.resource_changes[] | select(.change.actions[] == "delete")
] | length' > destroy_count.txt
DESTROYS=$(cat destroy_count.txt)
if [ "$DESTROYS" -gt "0" ]; then
echo "::error::Plan contains $DESTROYS destroy(s). Manual approval required."
exit 1
fi
3. Drift Detection on a Schedule
# Run refresh-only plan daily; alert if drift detected
terraform plan -refresh-only -detailed-exitcode
# Exit code 2 = diff detected. Wire this to PagerDuty or Slack.
4. Checkov Rule for Missing prevent_destroy
checkov -d . --check CKV_TF_1 # checks for lifecycle prevent_destroy on high-risk resources
5. Lock State Aggressively
- Use DynamoDB state locking for S3 backends — never skip it.
- Enable Terraform Cloud run triggers with lock timeouts.
- Never manually edit
.tfstate— useterraform statesubcommands exclusively.