Why does `terraform plan` succeed but `terraform apply` throws 'Error: invalid index'?

During `terraform plan`, the value of a data source result or resource attribute is often marked as '(known after apply)' — a deferred unknown. Terraform cannot evaluate the index expression against an unknown list, so it skips the bounds check. At apply-time, the data source is resolved to a real (empty) list and the index evaluation runs for the first time, triggering the error. This is a fundamental limitation of Terraform's two-phase evaluation model.

What is the difference between using `try()` and a `length() > 0` guard for this error?

`try(some_list[0], null)` silently returns null on any evaluation error, including type mismatches and other unrelated issues — it can mask bugs. A `length(some_list) > 0 ? some_list[0] : null` guard is explicit and only handles the empty-list case. For production code, prefer the explicit guard or a `precondition` block so failures are loud and attributable. Reserve `try()` for genuinely optional lookups where null is a valid and expected outcome.

Can this error cause Terraform state corruption if it happens mid-apply?

It can cause state divergence, which is distinct from corruption. Resources provisioned before the failing line are written to state. Resources after it are not. The state file itself remains internally consistent, but your real infrastructure and your state file now describe different realities. You must audit which resources were created, potentially import or taint them, and re-run apply after fixing the index guard. If using a remote backend with state locking (S3+DynamoDB), verify the lock was released after the failed apply before retrying.

How to Fix Terraform 'Error: invalid index' on a Zero-Length List During Apply

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 10 mins

TL;DR

What broke: Terraform attempted to access index [0] (or [N]) on a list that returned zero elements at apply-time — a runtime index-out-of-bounds that halts the entire terraform apply run.
How to fix it: Guard every list index access with a length() check, use the try() built-in, or use one() for single-element lists to safely short-circuit evaluation.
Fast path: Use our Client-Side Sandbox above to paste your failing .tf block and auto-refactor the index access with the correct null-safe pattern.

The Incident (What Does the Error Mean?)

The raw error looks like this:

Error: Invalid index

  on main.tf line 14, in resource "aws_instance" "web":
  14:   subnet_id = data.aws_subnets.selected.ids[0]
    |----------------
    | data.aws_subnets.selected.ids is empty list of string

The given key does not identify an element in this collection value.

Terraform does not evaluate list length lazily. When the HCL expression some_list[0] is evaluated during the apply graph walk, the runtime checks whether index 0 exists. If the list has zero elements, Terraform throws a hard invalid index error and aborts the entire apply run — no partial apply, no rollback, just a dead stop.

This is not a plan-time error. It surfaces at apply because the list's contents are only known after a prior resource is created or a data source is resolved — meaning terraform plan often passes cleanly and the bomb detonates at apply.

The Attack Vector / Blast Radius

This failure mode is deceptively catastrophic in automated pipelines:

CI/CD pipeline halts mid-apply. Any resources provisioned before the failing line are now live. Resources after it are not. Your state is partially diverged.
Data source race conditions. A data.aws_subnets or data.aws_security_groups filter that returns results in one region/account returns zero in another. The same code, different environment → silent bomb.
Chained module failures. If the zero-length list comes from a module output, every downstream module that consumes that output via index access will also fail, cascading across your entire workspace.
State lock contention. A failed mid-apply leaves the Terraform state lock held (on S3+DynamoDB backends) until the lock timeout, blocking all other operators from running any Terraform commands against that workspace.
Automated DR/scale-out failures. In autoscaling or blue/green patterns, this error in a for_each or count-driven resource means your scale-out event silently fails to provision new capacity.

The core danger: terraform plan reports success. The engineer merges the PR. The apply pipeline fires. Production provisioning halts halfway through.

How to Fix It (The Solution)

Root Cause Pattern

The error always traces back to one of these three patterns:

Direct index on a data source result: data.aws_subnets.app.ids[0]
Variable default not guaranteed non-empty: var.availability_zones[0]
Resource attribute list of unknown length: aws_lb.main.subnets[0]

Basic Fix — `length()` Guard with `null` Fallback

 resource "aws_instance" "web" {
   ami           = "ami-0c55b159cbfafe1f0"
   instance_type = "t3.micro"
-  subnet_id     = data.aws_subnets.selected.ids[0]
+  subnet_id     = length(data.aws_subnets.selected.ids) > 0 ? data.aws_subnets.selected.ids[0] : null
 }

⚠️ Setting subnet_id = null will cause a separate validation error if the argument is required. Combine this with a precondition block (see Enterprise fix below) to surface the failure early with a human-readable message.

Basic Fix — `try()` Built-in (Terraform ≥ 0.13)

 locals {
-  primary_subnet = data.aws_subnets.selected.ids[0]
+  primary_subnet = try(data.aws_subnets.selected.ids[0], null)
 }

try() evaluates each argument in order and returns the first one that does not produce an error. Use sparingly — it silently swallows errors and can mask real misconfigurations if overused.

Basic Fix — `one()` for Single-Element Lists (Terraform ≥ 1.0)

 locals {
-  vpc_id = data.aws_vpcs.target.ids[0]
+  vpc_id = one(data.aws_vpcs.target.ids)
 }

one() returns null if the list is empty, the single element if the list has exactly one item, and throws an error if the list has more than one element — which is exactly the constraint you want for data sources that should return a unique result.

Enterprise Best Practice — `precondition` Block (Terraform ≥ 1.2)

Don't silently null-out. Fail fast with a clear error message that tells the operator exactly what filter produced zero results.

 data "aws_subnets" "selected" {
   filter {
     name   = "tag:Environment"
     values = [var.environment]
   }
 }
 
 resource "aws_instance" "web" {
   ami           = "ami-0c55b159cbfafe1f0"
   instance_type = "t3.micro"
   subnet_id     = data.aws_subnets.selected.ids[0]
 
+  lifecycle {
+    precondition {
+      condition     = length(data.aws_subnets.selected.ids) > 0
+      error_message = "No subnets found matching tag Environment=${var.environment}. Verify the tag exists in this account/region before applying."
+    }
+  }
 }

This surfaces the failure before the resource create/update action runs, with a message that tells the on-call engineer exactly what to check. No partial state divergence.

Enterprise Best Practice — Variable Validation Block

If the empty list originates from an input variable, reject it at the variable level:

 variable "availability_zones" {
   type        = list(string)
   description = "List of AZs to deploy into. Must contain at least one entry."
+
+  validation {
+    condition     = length(var.availability_zones) > 0
+    error_message = "availability_zones must contain at least one AZ. Received empty list."
+  }
 }

This fires at terraform plan time, not apply — stopping the pipeline before any API calls are made.

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. terraform validate is not enough. It does not evaluate data source results or variable values. It will not catch this class of error.

2. Checkov — Static Analysis

Add to your pipeline:

checkov -d . --check CKV_TF_1 --compact

Write a custom Checkov check for index-without-length-guard patterns in your organization's policy repo.

3. TFLint — Rule-Based Linting

tflint --enable-rule=terraform_required_version

Combine with the tflint-ruleset-aws plugin. Add a custom rule that flags any expression matching the regex \w+\[\d+\] that is not wrapped in try() or preceded by a length() guard.

4. OPA/Conftest — Policy as Code

# policy/no_bare_index.rego
package terraform

deny[msg] {
  # Flag any resource attribute that uses a bare numeric index
  # on a data source output without a try() wrapper
  walk(input.resource, [path, value])
  is_string(value)
  regex.match(`data\.\w+\.\w+\.\w+\[\d+\]`, value)
  not regex.match(`try\(`, value)
  msg := sprintf("Bare list index without try() guard at path: %v", [path])
}

conftest test main.tf.json --policy policy/

5. terraform plan -detailed-exitcode in CI

Always run plan against a staging workspace that mirrors production data source state. A plan that passes against an empty test account will fail against production if data sources return different cardinalities.

# .github/workflows/terraform.yml
- name: Terraform Plan
  run: |
    terraform plan -detailed-exitcode -out=tfplan
  env:
    TF_VAR_environment: staging

6. Pin Terraform version in .terraform-version or required_version

The one() built-in and precondition lifecycle blocks require Terraform ≥ 1.0 and ≥ 1.2 respectively. Lock your version to prevent regressions when pipelines run on older Terraform binaries:

terraform {
  required_version = ">= 1.2.0"
}