What is the maximum Terraform state file size for S3 backends with DynamoDB locking?

S3 itself has no practical object size limit for state files, but DynamoDB state lock items are capped at 400KB per item. The real constraint is the DynamoDB `LockID` item — if you are storing state hash metadata there, keep state under 400KB for that field. The operational failure point is typically the HTTP PUT to S3 timing out for files over ~50MB on slow CI runners, or Terraform Cloud's 413 response at large payload thresholds. The engineering safe limit is under 5MB per workspace state file.

Will running `terraform state rm` on data sources destroy the actual cloud resources?

No. `terraform state rm` only removes the resource from Terraform's state tracking. It does not call any destroy API. Data sources are read-only by definition — they have no destroy lifecycle. After removal, Terraform will simply re-fetch the data source on the next `plan` or `apply`. For managed resources (`resource` blocks, not `data`), `terraform state rm` also does not destroy the cloud resource — it just makes Terraform forget it exists, which means subsequent plans will show it as a new resource to create.

How do I migrate specific resources from a monolith state to a new child workspace without downtime?

Use `terraform state mv -state= -state-out= ` to move resources between local state files, then push the destination state with `terraform state push`. For Terraform 1.1+, declare `moved` blocks in the destination workspace's HCL so the migration is code-reviewed and idempotent. Always take a backup with `terraform state pull > backup-$(date +%s).tfstate` before any state surgery, and use `-lock=true` to prevent concurrent modifications during the operation.

How to Fix Terraform Remote Backend Migration Failure: State Snapshot Too Large

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 30–90 mins

TL;DR

What broke: terraform init or terraform state push aborted mid-migration because the serialized .tfstate JSON blob exceeds the remote backend's maximum accepted payload (S3+DynamoDB: ~5MB item limit; Terraform Cloud: 512MB soft cap but HTTP timeouts hit first; Azure Blob: chunking failures on single PUT).
How to fix it: Decompose the monolith state into child workspaces, surgically remove orphaned or read-only data resources from state, and compress outputs. Then re-attempt migration with TF_LOG=DEBUG to confirm payload size.
Use our Client-Side Sandbox below to auto-refactor this — paste your terraform.tfstate or root module config and get a decomposition plan without uploading secrets anywhere.

The Incident (What Does the Error Mean?)

Raw error output during terraform init backend migration or terraform state push:

Error: Failed to save state

Error saving state: failed to upload state: state snapshot too large:
the state file size (47.3 MB) exceeds the maximum allowed size (5 MB).
Migrating state from local to remote failed: state snapshot too large

or on Terraform Cloud:

╷
│ Error: Error uploading state
│
│ Error uploading state: request body too large
│ (413 Request Entity Too Large)
╵

Immediate consequence: The migration transaction is rolled back. Your state remains local. Any subsequent terraform plan or terraform apply run by a teammate that points to the remote backend will see an empty or stale state, causing Terraform to attempt to recreate every managed resource — a full-blast destructive plan against live infrastructure.

The Attack Vector / Blast Radius

This is not just an inconvenience. The failure mode is a split-brain state condition:

Engineer A's local state has 2,400 resources. Migration fails.
Engineer B runs terraform plan against the now-initialized remote backend (which has zero or outdated state).
Terraform diffs zero known resources against the live AWS account — outputs a plan to create 2,400 resources that already exist, or worse, a plan that destroys and recreates them if import blocks are absent.
An inattentive terraform apply -auto-approve in CI/CD deletes production RDS instances, EKS node groups, or VPC peering connections.

Root causes ranked by frequency:

Monorepo state: hundreds of modules managed in a single root workspace, each with dozens of outputs stored in state.
data source abuse: data.aws_ami, data.aws_instances, data.kubernetes_all_namespaces storing full API responses in state.
Terraform output values containing entire rendered templates, kubeconfig blobs, or base64-encoded certificates.
Accumulated null_resource / terraform_data with large triggers maps never pruned after deprecation.
Provider schemas stored in state (pre-1.0 Terraform versions serialized full provider schemas).

How to Fix It (The Solution)

Step 0 — Diagnose the Bloat

# Get state size before touching anything
terraform state pull > current.tfstate
du -sh current.tfstate
wc -c current.tfstate

# Find the heaviest resources by output size
cat current.tfstate | jq '.resources[] | {type: .type, name: .name, size: (. | tojson | length)} | select(.size > 10000)' | sort -t: -k3 -rn | head -20

# Count resources per module
cat current.tfstate | jq '[.resources[] | .module] | group_by(.) | map({module: .[0], count: length}) | sort_by(-.count)'

Basic Fix — Purge Data Sources and Orphaned Resources from State

Data sources do not need to live in state. They are re-fetched on every plan. Remove them:

# List all data sources currently tracked in state
terraform state list | grep '^data\.' 

# Remove them — they will be re-read on next plan, not destroyed
terraform state rm $(terraform state list | grep '^data\.')

# Remove known orphaned null_resources
terraform state rm null_resource.legacy_bootstrap
terraform state rm terraform_data.deprecated_trigger

Trim bloated outputs from your root module:

- output "full_kubeconfig" {
-   value = module.eks.kubeconfig_raw   # 180KB base64 blob stored in state
- }
+ output "cluster_endpoint" {
+   value = module.eks.cluster_endpoint  # just the endpoint string
+ }
+
+ # Store kubeconfig in SSM Parameter Store or Secrets Manager instead:
+ resource "aws_ssm_parameter" "kubeconfig" {
+   name  = "/infra/eks/kubeconfig"
+   type  = "SecureString"
+   value = module.eks.kubeconfig_raw
+ }

Enterprise Best Practice — Decompose the Monolith into Isolated Workspaces

The permanent fix is state decomposition. Split the monolith by lifecycle and blast radius:

# BEFORE: single root module managing everything
# ./main.tf — 2,400 resources, 47MB state
- module "networking" { source = "./modules/networking" }
- module "eks" { source = "./modules/eks" }
- module "rds" { source = "./modules/rds" }
- module "app_workloads" { source = "./modules/apps" }

# AFTER: four independent workspaces with remote state data sources
# workspace: infra-networking (changes once a quarter)
+ module "networking" { source = "./modules/networking" }

# workspace: infra-eks (changes weekly)
+ data "terraform_remote_state" "networking" {
+   backend = "s3"
+   config = {
+     bucket = "my-tfstate"
+     key    = "infra-networking/terraform.tfstate"
+     region = "us-east-1"
+   }
+ }
+ module "eks" {
+   source     = "./modules/eks"
+   vpc_id     = data.terraform_remote_state.networking.outputs.vpc_id
+   subnet_ids = data.terraform_remote_state.networking.outputs.private_subnets
+ }

# workspace: infra-rds
+ module "rds" { source = "./modules/rds" }

# workspace: app-workloads (changes daily — CI/CD)
+ module "app_workloads" { source = "./modules/apps" }

Migrate resources between state files without destroying them using terraform state mv across workspaces:

# Pull source state
terraform state pull > monolith.tfstate

# In the new networking workspace:
cd ../infra-networking
terraform init

# Move resources across state files
terraform state mv \
  -state=../monolith/monolith.tfstate \
  -state-out=terraform.tfstate \
  module.networking.aws_vpc.main \
  module.networking.aws_vpc.main

# Repeat for all networking resources, then push
terraform state push terraform.tfstate

For Terraform 1.1+, use moved blocks to make this refactor declarative and peer-reviewable:

+ # In infra-networking/moved.tf
+ moved {
+   from = module.networking.aws_vpc.main
+   to   = aws_vpc.main
+ }

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Enforce state size limits as a pipeline gate:

# .github/workflows/terraform.yml
- name: Check state size before migration
  run: |
    terraform state pull > /tmp/current.tfstate
    SIZE=$(wc -c < /tmp/current.tfstate)
    MAX=4194304  # 4MB hard gate (leave headroom under 5MB DynamoDB limit)
    if [ "$SIZE" -gt "$MAX" ]; then
      echo "ERROR: State file ${SIZE} bytes exceeds ${MAX} byte gate."
      echo "Run: terraform state list | grep '^data\.' | xargs terraform state rm"
      exit 1
    fi

2. Checkov policy to ban data-source-in-output anti-pattern:

# checkov custom check: no raw data source values in outputs
from checkov.terraform.checks.resource.base_resource_check import BaseResourceCheck
# ... flag output blocks whose value references data.* and length > threshold

3. OPA/Conftest policy to enforce workspace decomposition:

# policy/state_size.rego
package terraform.state

deny[msg] {
  count(input.resources) > 500
  msg := sprintf("Workspace has %d resources. Max 500 per workspace. Decompose into child workspaces.", [count(input.resources)])
}

deny[msg] {
  r := input.resources[_]
  r.mode == "data"
  # data sources should not persist in pushed state snapshots
  msg := sprintf("Data source %s.%s found in state snapshot. Remove with terraform state rm.", [r.type, r.name])
}

4. Atlantis / Terraform Cloud run policy:

Set TF_CLI_ARGS_state_push="-lock-timeout=60s" and add a pre-plan hook that runs the size check script above. Fail the run before it ever attempts migration.

5. Periodic state audits via cron:

#!/bin/bash
# cron: 0 6 * * 1 /opt/scripts/tfstate-audit.sh
for workspace in networking eks rds apps; do
  cd /infra/$workspace
  terraform state pull | wc -c | awk -v ws="$workspace" '{if($1>3000000) print "WARN: "ws" state is "$1" bytes"}'
done