Initializing Enclave...

How to Fix Terraform KMS Key Pending Deletion Error: Cannot Update Key in Pending Deletion State

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 10 mins

TL;DR

  • What broke: Terraform is targeting a KMS key scheduled for deletion (PENDING_DELETION). AWS rejects every mutating API call — PutKeyPolicy, UpdateAlias, EnableKey — against it, causing terraform apply to hard-fail.
  • How to fix it: Cancel the pending deletion via aws kms cancel-key-deletion --key-id <key-id>, then re-run terraform apply. If the key is intentionally gone, remove the resource from state and re-provision.
  • Shortcut: Use our Client-Side Sandbox below to auto-refactor this — paste your failing .tf file and get corrected HCL instantly without leaking your key ARNs.

The Incident (What Does the Error Mean?)

Raw error output from terraform apply:

│ Error: updating KMS Key (arn:aws:kms:us-east-1:123456789012:key/abcd-1234): 
│ KMSInvalidStateException: arn:aws:kms:us-east-1:123456789012:key/abcd-1234 
│ is pending deletion.
│   with aws_kms_key.primary,
│   on kms.tf line 4, in resource "aws_kms_key" "primary":
│    4: resource "aws_kms_key" "primary" {

Immediate consequence: terraform apply exits non-zero. Any downstream resources depending on this key — S3 bucket SSE, RDS storage encryption, Secrets Manager secrets, EBS volumes — are now blocked from being created or updated in the same plan. Your pipeline is dead until this is resolved.


The Attack Vector / Blast Radius

This is not just a nuisance error. Here is the full blast radius:

  • Data accessibility cliff: A KMS key in PENDING_DELETION with a 7-day minimum window means any data encrypted under it — S3 objects, EBS snapshots, RDS instances — becomes permanently unrecoverable if the window expires. There is no AWS support escalation path after key deletion completes.
  • Terraform state drift: If a human manually scheduled the key for deletion outside of Terraform (via Console or CLI), Terraform's state still believes the key is ENABLED. Every subsequent plan will attempt mutations, every apply will fail. The state is now lying to you.
  • CI/CD pipeline cascade: A single failed apply in a shared Terraform workspace (Terraform Cloud, Atlantis) blocks all other PRs queued against that workspace. One bad key deletion can freeze infra deployments for an entire team.
  • Silent encryption failures: Services like AWS Backup or cross-account replication jobs that rely on this key will start throwing KMSInvalidStateException in their own logs — often silently, with no alerting unless you have CloudWatch alarms on KMS API errors.

How to Fix It (The Solution)

Step 0: Confirm the Key State

aws kms describe-key --key-id arn:aws:kms:us-east-1:123456789012:key/abcd-1234 \
  --query 'KeyMetadata.{State:KeyState,DeletionDate:DeletionDate}'

If output shows "State": "PendingDeletion", proceed below.


Basic Fix: Cancel Pending Deletion

# Cancel the scheduled deletion — this is immediate and reversible
aws kms cancel-key-deletion --key-id arn:aws:kms:us-east-1:123456789012:key/abcd-1234

# Re-enable the key (cancelling deletion sets state to DISABLED)
aws kms enable-key --key-id arn:aws:kms:us-east-1:123456789012:key/abcd-1234

# Now re-run Terraform
terraform apply

If the key was intentionally deleted and you need a new one, remove the stale resource from state and let Terraform create a fresh key:

terraform state rm aws_kms_key.primary
terraform apply  # provisions a new key, new ARN — update all dependent resources

Enterprise Best Practice: Prevent Accidental Deletion via Terraform HCL

The root cause is often a missing lifecycle block or an explicit deletion_window_in_days set too low. Fix the Terraform resource definition:

 resource "aws_kms_key" "primary" {
   description             = "Primary encryption key for production workloads"
   enable_key_rotation     = true
-  deletion_window_in_days = 7
+  deletion_window_in_days = 30
+
+  lifecycle {
+    prevent_destroy = true
+  }
+
+  tags = {
+    Environment = "production"
+    ManagedBy   = "terraform"
+  }
 }

 resource "aws_kms_alias" "primary" {
   name          = "alias/production-primary"
   target_key_id = aws_kms_key.primary.key_id
 }

Why prevent_destroy = true matters: This causes terraform plan to error loudly if any configuration change would result in destroying this resource, before apply is ever attempted. It is your last line of defense against accidental key destruction via IaC.

Why 30 days: AWS minimum is 7 days, but 30 days gives your incident response team time to detect the deletion via CloudTrail/CloudWatch and cancel it before data becomes inaccessible.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Checkov Policy (runs in your pipeline pre-apply)

Checkov rule CKV_AWS_7 already enforces key rotation, but you need a custom check for prevent_destroy. Add this to your .checkov.yml:

custom_checks:
  - name: "KMS keys must have prevent_destroy lifecycle"
    id: CKV_CUSTOM_KMS_01
    resource: aws_kms_key
    check_type: terraform
    attribute: lifecycle.prevent_destroy
    operator: equals
    value: true

2. OPA / Conftest Policy

package terraform.kms

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_kms_key"
  not resource.change.after.lifecycle[_].prevent_destroy
  msg := sprintf("KMS key '%v' must have lifecycle.prevent_destroy = true", [resource.address])
}

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_kms_key"
  resource.change.after.deletion_window_in_days < 14
  msg := sprintf("KMS key '%v' deletion_window_in_days must be >= 14", [resource.address])
}

3. CloudWatch Alarm on KMS Scheduled Deletion

resource "aws_cloudwatch_metric_alarm" "kms_pending_deletion" {
  alarm_name          = "kms-key-pending-deletion"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = 1
  metric_name         = "NumberOfRequestsWithKeyState"
  namespace           = "AWS/KMS"
  period              = 300
  statistic           = "Sum"
  threshold           = 1
  dimensions = {
    KeyState = "PendingDeletion"
  }
  alarm_actions = [aws_sns_topic.alerts.arn]
}

4. Terraform Sentinel Policy (Terraform Cloud/Enterprise)

import "tfplan/v2" as tfplan

kms_keys = filter tfplan.resource_changes as _, rc {
  rc.type is "aws_kms_key" and rc.mode is "managed"
}

for kms_keys as _, key {
  lifecycle = key.change.after.lifecycle else []
  prevent_destroy = any lifecycle as lc { lc.prevent_destroy is true }
  
  if not prevent_destroy {
    print("FAIL: KMS key", key.address, "must have prevent_destroy = true")
    main = rule { false }
  }
}

main = rule { true }

Deploy all three layers — Checkov in PR checks, OPA in the CI gate before terraform plan output is trusted, and the CloudWatch alarm as your runtime safety net.

Related Diagnostics

"Part of the Security Utility Matrix."

View all 140 Security Tools →