Initializing Enclave...

Resolving PostgreSQL 'Shared Lock Conflicted with Exclusive Lock' Errors in High-Concurrency Production Systems

Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

  • What broke: A transaction holding a ShareLock (e.g., from a SELECT FOR SHARE, FK check, or CREATE INDEX CONCURRENTLY) is blocked by — or directly conflicts with — another transaction demanding an ExclusiveLock or AccessExclusiveLock (e.g., ALTER TABLE, TRUNCATE, LOCK TABLE, or a long-running UPDATE).
  • How to fix it: Identify the blocking PID via pg_locks + pg_stat_activity, kill or wait out the offending transaction, then restructure your migration and application lock acquisition order to prevent re-occurrence.
  • Use our Client-Side Sandbox above to paste your migration SQL or ORM-generated queries and auto-refactor the lock acquisition sequence.

The Incident (What Does the Error Mean?)

Raw error output from PostgreSQL logs or your application:

ERROR:  deadlock detected
DETAIL:  Process 47382 waits for ShareLock on transaction 9201; blocked by process 47401.
         Process 47401 waits for ExclusiveLock on relation 16423 of database 16384; blocked by process 47382.
HINT:  See server log for query details.

or the non-deadlock blocking variant:

ERROR:  canceling statement due to lock timeout
CONTEXT:  while updating tuple (0,42) in relation "orders"
DETAIL:  shared lock conflicted with exclusive lock on relation "orders"

Immediate consequence: The relation (table or index) is completely inaccessible for the duration of the conflict. In high-throughput OLTP systems, this means connection pool exhaustion within seconds as threads pile up waiting on the lock. Your app starts throwing 500s or connection timeout errors before the DBA even opens a terminal.


The Attack Vector / Blast Radius

This is a cascading availability failure, not just a slow query.

Lock hierarchy that causes this:

Operation Lock Mode Acquired
SELECT FOR SHARE RowShareLock
SELECT FOR UPDATE RowShareLock
FK constraint check RowShareLock
UPDATE / DELETE RowExclusiveLock
CREATE INDEX (non-concurrent) ShareLock
ALTER TABLE, TRUNCATE AccessExclusiveLock
LOCK TABLE ... IN EXCLUSIVE MODE ExclusiveLock

AccessExclusiveLock conflicts with every other lock mode — including plain AccessShareLock held by a vanilla SELECT. This means a single ALTER TABLE in a migration script, if it runs while the table has any active readers or writers, blocks the entire table and queues every subsequent query behind it.

Blast radius timeline:

  1. Migration runs ALTER TABLE orders ADD COLUMN ... → acquires AccessExclusiveLock, waits for existing transactions to finish.
  2. A long-running SELECT (analytics query, ORM N+1, etc.) holds AccessShareLock — migration waits.
  3. New application queries arrive, queue behind the ALTER TABLE lock waiter.
  4. Connection pool saturates. PgBouncer or app pool hits max connections.
  5. Application returns 500/503 across all endpoints touching orders.

The most dangerous pattern: ALTER TABLE inside an explicit transaction block with other DML. The lock is held for the entire transaction duration.


How to Fix It (The Solution)

Step 1: Identify the Blocker Right Now

SELECT
  blocked.pid,
  blocked.query AS blocked_query,
  blocked.wait_event_type,
  blocked.wait_event,
  blocking.pid AS blocking_pid,
  blocking.query AS blocking_query,
  blocking.state AS blocking_state,
  now() - blocking.query_start AS blocking_duration
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
  ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE cardinality(pg_blocking_pids(blocked.pid)) > 0;

If the blocker is an idle-in-transaction session or a runaway analytics query:

SELECT pg_terminate_backend(<blocking_pid>);

Basic Fix: Set lock_timeout and statement_timeout

Never run DDL or long transactions without a timeout. This prevents indefinite blocking:

- ALTER TABLE orders ADD COLUMN metadata jsonb;

+ SET lock_timeout = '2s';
+ SET statement_timeout = '30s';
+ ALTER TABLE orders ADD COLUMN metadata jsonb;

If the lock cannot be acquired within 2 seconds, the statement fails fast instead of queuing and starving the connection pool.

Enterprise Best Practice: Zero-Downtime DDL with pg_repack or CONCURRENTLY

For index creation — never block reads:

- CREATE INDEX idx_orders_user_id ON orders(user_id);

+ CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id);

For ALTER TABLE on large tables — use pg_repack or a multi-phase shadow-table migration instead of direct DDL.

For FK constraints — add them as NOT VALID first, then validate separately:

- ALTER TABLE order_items
-   ADD CONSTRAINT fk_order FOREIGN KEY (order_id) REFERENCES orders(id);

+ -- Phase 1: Add constraint without full table scan (takes ShareRowExclusiveLock briefly)
+ ALTER TABLE order_items
+   ADD CONSTRAINT fk_order FOREIGN KEY (order_id) REFERENCES orders(id)
+   NOT VALID;
+
+ -- Phase 2: Validate in a separate, low-priority transaction
+ ALTER TABLE order_items VALIDATE CONSTRAINT fk_order;

For application-level lock ordering — always acquire locks in a consistent global order to prevent deadlocks:

- -- Transaction A: locks accounts then transfers
- BEGIN;
- SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
- SELECT * FROM transfers WHERE id = 99 FOR UPDATE;
- COMMIT;
-
- -- Transaction B: locks transfers then accounts (DEADLOCK)
- BEGIN;
- SELECT * FROM transfers WHERE id = 99 FOR UPDATE;
- SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
- COMMIT;

+ -- Both transactions acquire locks in the same order (accounts → transfers)
+ BEGIN;
+ SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
+ SELECT * FROM transfers WHERE id = 99 FOR UPDATE;
+ COMMIT;

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Lint Migrations with squawk

squawk is a PostgreSQL migration linter that catches dangerous lock patterns before they hit production:

# Install
npm install -g squawk-cli

# Lint a migration file
squawk migration_20240801_add_metadata.sql

# Example output:
# migration.sql:3:1: warning: ban-drop-column
# migration.sql:7:1: error: require-concurrent-index-creation

Add to your CI pipeline:

# .github/workflows/db-lint.yml
- name: Lint PostgreSQL Migrations
  run: squawk migrations/*.sql

2. Enforce lock_timeout at the Role Level

Set sane defaults at the database role level so no application session can hold locks indefinitely:

ALTER ROLE app_user SET lock_timeout = '3s';
ALTER ROLE app_user SET idle_in_transaction_session_timeout = '30s';
ALTER ROLE app_user SET statement_timeout = '60s';

3. Monitor Lock Wait Events in Production

Alert before the connection pool saturates:

-- Prometheus-compatible query for pg_exporter custom metrics
SELECT count(*) AS lock_waiters
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
  AND state = 'active';

Set a PagerDuty/Alertmanager alert if lock_waiters > 3 for more than 10 seconds.

4. OPA / Policy Gate for Migration PRs

If you use a migration framework (Flyway, Liquibase), add an OPA policy that rejects any migration containing bare ALTER TABLE or CREATE INDEX without CONCURRENTLY or a preceding SET lock_timeout:

package db.migrations

deny[msg] {
  input.sql_statements[_] == regex.match(`(?i)^\s*ALTER TABLE`, input.sql_statements[_])
  not regex.match(`(?i)lock_timeout`, input.preamble)
  msg := "ALTER TABLE requires SET lock_timeout in preamble. Use pg_repack or phased migration."
}

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →