How to Fix PostgreSQL Transaction ID Wraparound: Diagnosing and Resolving vacuum_freeze_min_age Errors Before Database Shutdown
Threat/Impact Level: CRITICAL | Downtime Risk: HIGH | Time to Fix: 30–120 mins (depends on table bloat)
TL;DR
- What broke: One or more tables/databases have a transaction ID (
relfrozenxid) age approaching the2^31(~2.1 billion) hard limit. PostgreSQL will enter read-only self-protection mode and refuse all writes — or crash entirely. - How to fix it: Immediately run
VACUUM FREEZEon the offending tables, tunevacuum_freeze_min_ageandautovacuum_freeze_max_age, and verify autovacuum is not blocked. - Shortcut: Use our Client-Side Sandbox above to paste your
pg_stat_user_tablesorpostgresql.confand auto-generate the exactVACUUMcommands and config diff for your instance.
The Incident (What Does the Error Mean?)
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
HINT: Stop the postmaster and vacuum that database in single-user mode.
Or use the sandbox to auto-generate the VACUUM FREEZE command sequence.
PostgreSQL uses 32-bit transaction IDs (XIDs). The usable space is 2^31 = 2,147,483,648 transactions in each direction. When a table's relfrozenxid ages beyond vacuum_freeze_min_age without being frozen, Postgres starts warning. When it crosses autovacuum_freeze_max_age (default: 200M transactions), autovacuum is forced. If autovacuum is blocked, disabled, or undersized, and the age hits ~2.1B, Postgres shuts down writes to prevent XID collision — where a new transaction ID wraps around and becomes indistinguishable from an ancient one, silently corrupting visibility data.
Check your current XID age right now:
-- Find databases closest to wraparound
SELECT datname, age(datfrozenxid) AS xid_age,
2147483647 - age(datfrozenxid) AS xids_remaining
FROM pg_database
ORDER BY xid_age DESC;
-- Find the specific tables causing the problem
SELECT schemaname, relname, age(relfrozenxid) AS xid_age
FROM pg_stat_user_tables
ORDER BY xid_age DESC
LIMIT 20;
If xid_age is above 1,500,000,000 — you are in a production emergency right now.
The Attack Vector / Blast Radius
This is not a slow degradation. The failure mode is a cliff edge:
- Autovacuum silently starved — long-running transactions,
vacuum_cost_delayset too high, orautovacuum = offin a cloud parameter group meant freeze vacuums never completed. - Table bloat compounds the problem — a 500GB table with dead tuples takes hours to
VACUUM FREEZE. If you're already at 1.9B XID age, you may not have hours. - Postgres enters read-only mode — at
autovacuum_freeze_max_age + 10MXIDs remaining, Postgres emits warnings. At the hard limit, it refuses all DML. Your application seesERROR: database is not accepting commands. - Single-user mode recovery — if you miss the window, recovery requires stopping the postmaster and running
postgres --singleto force freeze, causing full downtime. - Cascading app failure — connection pools fill with failed transactions, queues back up, and any service without a circuit breaker falls over entirely.
Blast radius: Total write outage for all applications on the affected PostgreSQL instance. Not just one database — the entire cluster is at risk if shared catalog tables are affected.
How to Fix It
Step 0: Triage — Confirm the Age
SELECT max(age(relfrozenxid)) FROM pg_class WHERE relkind = 'r';
< 150,000,000→ Tune autovacuum, monitor.150M–500M→ Schedule aggressive VACUUM FREEZE during low-traffic window.500M–1.5B→ Run manual VACUUM FREEZE immediately on top offenders.> 1.5B→ Production emergency. Page your DBA. Start VACUUM FREEZE now.
Basic Fix — Emergency Manual Freeze
-- Run on the specific offending table first
VACUUM FREEZE VERBOSE your_schema.your_bloated_table;
-- Then freeze the entire database (will take time on large DBs)
VACUUM FREEZE VERBOSE;
-- Monitor progress in another session
SELECT pid, phase, heap_blks_total, heap_blks_scanned, heap_blks_vacuumed
FROM pg_stat_progress_vacuum;
⚠️
VACUUM FREEZEon a large table causes significant I/O. Run during off-peak hours if XID age allows. If not, run it anyway — a slow vacuum is better than a write shutdown.
Enterprise Best Practice — postgresql.conf Tuning
# postgresql.conf
- autovacuum_freeze_max_age = 200000000
+ autovacuum_freeze_max_age = 150000000
- vacuum_freeze_min_age = 50000000
+ vacuum_freeze_min_age = 10000000
- vacuum_freeze_table_age = 150000000
+ vacuum_freeze_table_age = 100000000
- autovacuum_vacuum_cost_delay = 20ms
+ autovacuum_vacuum_cost_delay = 2ms
- autovacuum_vacuum_cost_limit = 200
+ autovacuum_vacuum_cost_limit = 800
- autovacuum_max_workers = 3
+ autovacuum_max_workers = 6
# Add per-table override for your highest-churn tables
# ALTER TABLE high_churn_table SET (autovacuum_freeze_max_age = 100000000);
After editing, reload config without restart:
SELECT pg_reload_conf();
-- Verify the new values are live
SHOW autovacuum_freeze_max_age;
SHOW vacuum_freeze_min_age;
For RDS/Aurora/Cloud SQL: Apply these via your parameter group and reboot the instance if static parameters are involved. Check that rds.force_autovacuum_logging_level is set to warning so freeze vacuums are visible in logs.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Monitoring Alert (Prometheus + postgres_exporter)
# Alert fires at 750M XID age — gives you time to act
- alert: PostgresXIDWraparoundRisk
expr: pg_database_xid_age > 750000000
for: 5m
labels:
severity: critical
annotations:
summary: "XID age {{ $value }} on {{ $labels.datname }} — wraparound risk"
runbook: "https://your-wiki/runbooks/postgres-wraparound"
2. Scheduled Freeze Job (Kubernetes CronJob)
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-freeze-check
spec:
schedule: "0 2 * * 0" # Weekly, Sunday 2AM
jobTemplate:
spec:
template:
spec:
containers:
- name: psql
image: postgres:15-alpine
command:
- /bin/sh
- -c
- |
psql $DATABASE_URL -c "
SELECT schemaname || '.' || relname, age(relfrozenxid)
FROM pg_stat_user_tables
WHERE age(relfrozenxid) > 500000000
ORDER BY age(relfrozenxid) DESC;
"
3. Terraform / IaC Guard (Checkov custom policy)
# checkov custom check: ensure autovacuum_freeze_max_age is not default or higher
from checkov.common.models.enums import CheckResult
from checkov.terraform.checks.resource.base_resource_check import BaseResourceCheck
class PostgresFreezeMaxAgeCheck(BaseResourceCheck):
def __init__(self):
super().__init__(
name="Ensure autovacuum_freeze_max_age is below 200M",
id="CKV_PG_FREEZE_001",
supported_resources=["aws_db_parameter_group", "google_sql_database_instance"]
)
def scan_resource_conf(self, conf):
params = conf.get("parameter", [])
for p in params:
if p.get("name") == "autovacuum_freeze_max_age":
if int(p.get("value", 200000000)) >= 200000000:
return CheckResult.FAILED
return CheckResult.PASSED
4. Long-Running Transaction Guard
Long transactions block autovacuum from cleaning dead tuples, accelerating XID age. Add this query to your monitoring:
-- Kill transactions running longer than 1 hour (tune threshold)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND query_start < now() - interval '1 hour'
AND pid <> pg_backend_pid();
Enforce this automatically via idle_in_transaction_session_timeout = '30min' in postgresql.conf.
- #idle_in_transaction_session_timeout = 0
+ idle_in_transaction_session_timeout = '1800000' # 30 minutes in ms