Initializing Enclave...

How to Fix PostgreSQL Transaction ID Wraparound: Diagnosing and Resolving vacuum_freeze_min_age Errors Before Database Shutdown

Threat/Impact Level: CRITICAL | Downtime Risk: HIGH | Time to Fix: 30–120 mins (depends on table bloat)

TL;DR

  • What broke: One or more tables/databases have a transaction ID (relfrozenxid) age approaching the 2^31 (~2.1 billion) hard limit. PostgreSQL will enter read-only self-protection mode and refuse all writes — or crash entirely.
  • How to fix it: Immediately run VACUUM FREEZE on the offending tables, tune vacuum_freeze_min_age and autovacuum_freeze_max_age, and verify autovacuum is not blocked.
  • Shortcut: Use our Client-Side Sandbox above to paste your pg_stat_user_tables or postgresql.conf and auto-generate the exact VACUUM commands and config diff for your instance.

The Incident (What Does the Error Mean?)

ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
  Or use the sandbox to auto-generate the VACUUM FREEZE command sequence.

PostgreSQL uses 32-bit transaction IDs (XIDs). The usable space is 2^31 = 2,147,483,648 transactions in each direction. When a table's relfrozenxid ages beyond vacuum_freeze_min_age without being frozen, Postgres starts warning. When it crosses autovacuum_freeze_max_age (default: 200M transactions), autovacuum is forced. If autovacuum is blocked, disabled, or undersized, and the age hits ~2.1B, Postgres shuts down writes to prevent XID collision — where a new transaction ID wraps around and becomes indistinguishable from an ancient one, silently corrupting visibility data.

Check your current XID age right now:

-- Find databases closest to wraparound
SELECT datname, age(datfrozenxid) AS xid_age,
       2147483647 - age(datfrozenxid) AS xids_remaining
FROM pg_database
ORDER BY xid_age DESC;

-- Find the specific tables causing the problem
SELECT schemaname, relname, age(relfrozenxid) AS xid_age
FROM pg_stat_user_tables
ORDER BY xid_age DESC
LIMIT 20;

If xid_age is above 1,500,000,000 — you are in a production emergency right now.


The Attack Vector / Blast Radius

This is not a slow degradation. The failure mode is a cliff edge:

  1. Autovacuum silently starved — long-running transactions, vacuum_cost_delay set too high, or autovacuum = off in a cloud parameter group meant freeze vacuums never completed.
  2. Table bloat compounds the problem — a 500GB table with dead tuples takes hours to VACUUM FREEZE. If you're already at 1.9B XID age, you may not have hours.
  3. Postgres enters read-only mode — at autovacuum_freeze_max_age + 10M XIDs remaining, Postgres emits warnings. At the hard limit, it refuses all DML. Your application sees ERROR: database is not accepting commands.
  4. Single-user mode recovery — if you miss the window, recovery requires stopping the postmaster and running postgres --single to force freeze, causing full downtime.
  5. Cascading app failure — connection pools fill with failed transactions, queues back up, and any service without a circuit breaker falls over entirely.

Blast radius: Total write outage for all applications on the affected PostgreSQL instance. Not just one database — the entire cluster is at risk if shared catalog tables are affected.


How to Fix It

Step 0: Triage — Confirm the Age

SELECT max(age(relfrozenxid)) FROM pg_class WHERE relkind = 'r';
  • < 150,000,000 → Tune autovacuum, monitor.
  • 150M–500M → Schedule aggressive VACUUM FREEZE during low-traffic window.
  • 500M–1.5BRun manual VACUUM FREEZE immediately on top offenders.
  • > 1.5BProduction emergency. Page your DBA. Start VACUUM FREEZE now.

Basic Fix — Emergency Manual Freeze

-- Run on the specific offending table first
VACUUM FREEZE VERBOSE your_schema.your_bloated_table;

-- Then freeze the entire database (will take time on large DBs)
VACUUM FREEZE VERBOSE;

-- Monitor progress in another session
SELECT pid, phase, heap_blks_total, heap_blks_scanned, heap_blks_vacuumed
FROM pg_stat_progress_vacuum;

⚠️ VACUUM FREEZE on a large table causes significant I/O. Run during off-peak hours if XID age allows. If not, run it anyway — a slow vacuum is better than a write shutdown.

Enterprise Best Practice — postgresql.conf Tuning

# postgresql.conf

- autovacuum_freeze_max_age = 200000000
+ autovacuum_freeze_max_age = 150000000

- vacuum_freeze_min_age = 50000000
+ vacuum_freeze_min_age = 10000000

- vacuum_freeze_table_age = 150000000
+ vacuum_freeze_table_age = 100000000

- autovacuum_vacuum_cost_delay = 20ms
+ autovacuum_vacuum_cost_delay = 2ms

- autovacuum_vacuum_cost_limit = 200
+ autovacuum_vacuum_cost_limit = 800

- autovacuum_max_workers = 3
+ autovacuum_max_workers = 6

# Add per-table override for your highest-churn tables
# ALTER TABLE high_churn_table SET (autovacuum_freeze_max_age = 100000000);

After editing, reload config without restart:

SELECT pg_reload_conf();

-- Verify the new values are live
SHOW autovacuum_freeze_max_age;
SHOW vacuum_freeze_min_age;

For RDS/Aurora/Cloud SQL: Apply these via your parameter group and reboot the instance if static parameters are involved. Check that rds.force_autovacuum_logging_level is set to warning so freeze vacuums are visible in logs.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Monitoring Alert (Prometheus + postgres_exporter)

# Alert fires at 750M XID age — gives you time to act
- alert: PostgresXIDWraparoundRisk
  expr: pg_database_xid_age > 750000000
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "XID age {{ $value }} on {{ $labels.datname }} — wraparound risk"
    runbook: "https://your-wiki/runbooks/postgres-wraparound"

2. Scheduled Freeze Job (Kubernetes CronJob)

apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-freeze-check
spec:
  schedule: "0 2 * * 0"  # Weekly, Sunday 2AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: psql
            image: postgres:15-alpine
            command:
            - /bin/sh
            - -c
            - |
              psql $DATABASE_URL -c "
                SELECT schemaname || '.' || relname, age(relfrozenxid)
                FROM pg_stat_user_tables
                WHERE age(relfrozenxid) > 500000000
                ORDER BY age(relfrozenxid) DESC;
              "

3. Terraform / IaC Guard (Checkov custom policy)

# checkov custom check: ensure autovacuum_freeze_max_age is not default or higher
from checkov.common.models.enums import CheckResult
from checkov.terraform.checks.resource.base_resource_check import BaseResourceCheck

class PostgresFreezeMaxAgeCheck(BaseResourceCheck):
    def __init__(self):
        super().__init__(
            name="Ensure autovacuum_freeze_max_age is below 200M",
            id="CKV_PG_FREEZE_001",
            supported_resources=["aws_db_parameter_group", "google_sql_database_instance"]
        )

    def scan_resource_conf(self, conf):
        params = conf.get("parameter", [])
        for p in params:
            if p.get("name") == "autovacuum_freeze_max_age":
                if int(p.get("value", 200000000)) >= 200000000:
                    return CheckResult.FAILED
        return CheckResult.PASSED

4. Long-Running Transaction Guard

Long transactions block autovacuum from cleaning dead tuples, accelerating XID age. Add this query to your monitoring:

-- Kill transactions running longer than 1 hour (tune threshold)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND query_start < now() - interval '1 hour'
  AND pid <> pg_backend_pid();

Enforce this automatically via idle_in_transaction_session_timeout = '30min' in postgresql.conf.

- #idle_in_transaction_session_timeout = 0
+ idle_in_transaction_session_timeout = '1800000'  # 30 minutes in ms

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →