Initializing Enclave...

Fixing PostgreSQL 'out of shared memory' Error: max_locks_per_transaction Exceeded

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5–15 mins (requires PostgreSQL restart or reload)

TL;DR

  • What broke: PostgreSQL's shared lock table is full. max_locks_per_transaction * (max_connections + max_prepared_transactions) defines the total lock table slots. You've exceeded that ceiling — every new transaction requiring a lock is now dead on arrival.
  • How to fix it: Increase max_locks_per_transaction in postgresql.conf and reload/restart. The default of 64 is catastrophically undersized for workloads using partitioned tables, pg_dump, or large batch operations.
  • Shortcut: Use our Client-Side Sandbox above to auto-refactor your postgresql.conf — paste your config and get a corrected file back instantly.

The Incident (What Does the Error Mean?)

Raw error output:

ERROR:  out of shared memory
DETAIL:  max_locks_per_transaction 64 exceeded

PostgreSQL maintains a fixed-size, shared-memory lock table allocated at startup. The total number of lock objects that can exist simultaneously is hard-capped at:

max_locks_per_transaction × (max_connections + max_prepared_transactions)

With defaults (max_locks_per_transaction = 64, max_connections = 100), that's 6,400 total lock slots. Sounds like a lot. It isn't. A single pg_dump on a schema with 200 tables burns through 200+ lock slots in one transaction. A partitioned table with 100 child partitions? Every query touching it locks every partition. You hit the wall. PostgreSQL cannot allocate a new lock object. The transaction dies. If this hits your primary write path, your application is down.


The Attack Vector / Blast Radius

This is not a subtle degradation. It is a hard failure. The blast radius depends on which transaction type hits the ceiling first:

Scenario 1 — Partitioned Tables (most common in modern schemas): PostgreSQL locks every partition when a query touches a partitioned table, even for a single-row lookup in some planner paths. A table with 128 partitions × 2 concurrent transactions = 256 locks. Add 10 concurrent users and you're at 2,560 locks for a single table. This is why teams migrating to declarative partitioning suddenly hit this error with no other config changes.

Scenario 2 — pg_dump / pg_restore in production: pg_dump opens a single transaction and acquires ACCESS SHARE locks on every table in the target schema simultaneously. A schema with 300 tables will attempt to hold 300+ locks in one transaction. On a default config, this alone will trigger the error and block your backup pipeline silently — or worse, crash it mid-dump, leaving you with a corrupt backup file.

Scenario 3 — Long-running batch transactions: Any ETL or migration script that opens a single transaction and touches hundreds of tables (e.g., TRUNCATE cascades, foreign key validation passes) will exhaust the lock table. The failure is non-deterministic — it depends on what else is running concurrently — making it intermittent and hellish to reproduce in staging.

Cascading failure path: Lock table exhaustion → new transactions fail with ERROR → application connection pool fills with failed/retrying connections → connection count spikes → max_connections exhausted → full application outage.


How to Fix It (The Solution)

Step 1: Audit Your Current Lock Pressure

Run this on your live instance before changing anything:

-- How many locks are currently held, grouped by transaction
SELECT
    pid,
    COUNT(*) AS lock_count,
    usename,
    application_name,
    LEFT(query, 80) AS query_snippet
FROM pg_locks
JOIN pg_stat_activity USING (pid)
GROUP BY pid, usename, application_name, query
ORDER BY lock_count DESC
LIMIT 20;

-- Check your current effective lock table ceiling
SHOW max_locks_per_transaction;
SHOW max_connections;
SHOW max_prepared_transactions;

If any single PID is holding > 50 locks, that's your culprit workload.


Basic Fix — Increase max_locks_per_transaction

Edit postgresql.conf. The value must be set before startup — it controls shared memory allocation. A reload (pg_reload_conf()) is not sufficient; a full restart is required.

# postgresql.conf

- max_locks_per_transaction = 64
+ max_locks_per_transaction = 256

Then restart:

# Systemd
sudo systemctl restart postgresql

# pg_ctl
pg_ctl restart -D /var/lib/postgresql/data

# Docker / Kubernetes — update your env var or ConfigMap, then rolling restart
POSTGRES_MAX_LOCKS_PER_TRANSACTION=256

Sizing guidance:

Workload Type Recommended Value
Default OLTP, no partitioning 64 (default, acceptable)
Partitioned tables (< 50 partitions) 128
Partitioned tables (50–200 partitions) 256
Heavy pg_dump, large schemas (200+ tables) 512
Extreme partitioning or multi-schema ETL 1024

⚠️ Memory cost: Each increment of max_locks_per_transaction costs roughly (max_connections + max_prepared_transactions) × ~250 bytes. Increasing from 64 to 256 on a 100-connection instance costs ~4.8 MB of shared memory. This is negligible. Do not let this stop you.


Enterprise Best Practice — Structural Fixes (Don't Just Bump the Number)

Bumping max_locks_per_transaction is the immediate fix. It is not the root cause fix. In production, you must also address the workload pattern.

1. Isolate pg_dump to a dedicated replica:

# pg_dump command

- pg_dump -h primary-db.internal -U postgres mydb > backup.sql
+ pg_dump -h replica-db.internal -U postgres mydb > backup.sql
# --lock-wait-timeout also prevents dump from hanging on lock contention
+ pg_dump -h replica-db.internal --lock-wait-timeout=30s -U postgres mydb > backup.sql

2. For partitioned tables — use LOCK_TIMEOUT and connection-level settings:

# For batch jobs that touch many partitions, set at session level:

- BEGIN;
- -- touches 150 partitions, explodes lock table
- SELECT * FROM events WHERE created_at > NOW() - INTERVAL '1 year';

+ BEGIN;
+ -- Declare intent early, fail fast if lock ceiling is near
+ SET lock_timeout = '5s';
+ SET idle_in_transaction_session_timeout = '30s';
+ SELECT * FROM events WHERE created_at > NOW() - INTERVAL '1 year';

3. PgBouncer transaction pooling — reduces effective concurrent lock holders:

# pgbouncer.ini

- pool_mode = session
+ pool_mode = transaction
# Transaction mode means connections return to the pool between statements,
# dramatically reducing simultaneous lock holders at the PostgreSQL level.

4. Kubernetes / Helm — inject the parameter correctly:

# values.yaml (Bitnami PostgreSQL chart)

 postgresql:
   postgresqlConfiguration:
-    max_locks_per_transaction: "64"
+    max_locks_per_transaction: "256"
+    max_prepared_transactions: "0"  # Set to 0 if not using 2PC; reduces lock table size

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

This error is 100% preventable with automated config linting. The default of 64 should be treated as a misconfiguration in any environment using partitioning or schemas with > 100 tables.

1. Checkov — IaC Policy for Terraform aws_db_parameter_group

# checkov custom check: check_pg_max_locks.py
from checkov.common.models.enums import CheckResult, CheckCategories
from checkov.terraform.checks.resource.base_resource_check import BaseResourceCheck

class PGMaxLocksCheck(BaseResourceCheck):
    def __init__(self):
        name = "Ensure max_locks_per_transaction is >= 128 for production RDS"
        id = "CKV_CUSTOM_PG_001"
        supported_resources = ['aws_db_parameter_group']
        categories = [CheckCategories.GENERAL_SECURITY]
        super().__init__(name=name, id=id, categories=categories,
                         supported_resources=supported_resources)

    def scan_resource_conf(self, conf):
        params = conf.get('parameter', [])
        for p in params:
            if isinstance(p, list): p = p[0]
            if p.get('name') == ['max_locks_per_transaction']:
                val = int(p.get('value', [64])[0])
                return CheckResult.PASSED if val >= 128 else CheckResult.FAILED
        return CheckResult.FAILED  # not set = default 64 = fail

2. OPA / Conftest Policy

# policy/pg_locks.rego
package postgresql

deny[msg] {
    input.resource_type == "aws_db_parameter_group"
    param := input.config.parameter[_]
    param.name == "max_locks_per_transaction"
    to_number(param.value) < 128
    msg := sprintf("max_locks_per_transaction is %v — must be >= 128 for partitioned workloads", [param.value])
}

deny[msg] {
    input.resource_type == "aws_db_parameter_group"
    not any_locks_param
    msg := "max_locks_per_transaction is not explicitly set — defaulting to 64 is unsafe for production"
}

any_locks_param {
    param := input.config.parameter[_]
    param.name == "max_locks_per_transaction"
}

3. GitHub Actions — Pre-merge Config Validation

# .github/workflows/pg-config-lint.yml
name: PostgreSQL Config Audit
on: [pull_request]
jobs:
  pg-lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Check max_locks_per_transaction
        run: |
          VALUE=$(grep 'max_locks_per_transaction' infra/postgresql.conf | awk -F= '{print $2}' | tr -d ' ')
          if [ -z "$VALUE" ] || [ "$VALUE" -lt 128 ]; then
            echo "::error::max_locks_per_transaction is '$VALUE' — must be >= 128. See runbook: docs/pg-locks-runbook.md"
            exit 1
          fi

4. Monitoring — Alert Before You Hit the Wall

Don't wait for the ERROR. Alert at 70% lock table utilization:

-- Prometheus / pg_exporter custom query
-- Add to queries.yaml
pg_lock_table_utilization:
  query: |
    SELECT
      COUNT(*) AS total_locks,
      current_setting('max_locks_per_transaction')::int *
        (current_setting('max_connections')::int +
         current_setting('max_prepared_transactions')::int) AS lock_table_ceiling,
      ROUND(
        COUNT(*) * 100.0 /
        (current_setting('max_locks_per_transaction')::int *
         (current_setting('max_connections')::int +
          current_setting('max_prepared_transactions')::int)), 2
      ) AS utilization_pct
    FROM pg_locks;
  metrics:
    - utilization_pct:
        usage: GAUGE
        description: "Lock table utilization percentage — alert at 70"

Set your alerting rule: fire PagerDuty at utilization_pct > 70. By the time you hit 100%, your on-call is already getting ERROR calls from the application layer.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →