How to Fix PostgreSQL Deadlock on Concurrent Same-Row UPDATE: Root Cause & Solutions
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins depending on transaction complexity
TL;DR
- What broke: Two or more concurrent transactions acquired row-level locks in opposite orders on the same row, causing PostgreSQL to abort one transaction with
ERROR: deadlock detected. - How to fix it: Enforce a consistent lock acquisition order across all transactions using
SELECT FOR UPDATE, or implement optimistic locking with aversioncolumn to eliminate contention entirely. - Fast path: Use our Client-Side Sandbox below to auto-refactor your transaction logic — paste your SQL and let it detect the conflicting lock sequence.
The Incident (What does the error mean?)
ERROR: deadlock detected
DETAIL: Process 47382 waits for ShareLock on transaction 9021; blocked by process 47401.
Process 47401 waits for ShareLock on transaction 9020; blocked by process 47382.
HINT: See server log for query details.
CONTEXT: while updating tuple (0,42) in relation "orders"
PostgreSQL's deadlock detector runs every deadlock_timeout (default 1 second). When it detects a cycle in the wait graph, it kills the youngest transaction — your application receives an exception, that request fails, and any uncommitted work is rolled back. Under high concurrency, this fires repeatedly, creating a failure storm.
The Attack Vector / Blast Radius
This is not a one-off. The blast radius escalates fast:
- Connection pool exhaustion: Transactions waiting for locks hold connections. Under load, your pool (PgBouncer, RDS Proxy) saturates. New requests queue, then time out.
- Retry amplification: Naive retry logic re-enters the same deadlock cycle. Each retry re-acquires locks in the same broken order. You get exponential retry storms.
- Data integrity risk: If your application doesn't correctly handle the rolled-back transaction (i.e., doesn't retry the entire unit of work), you get partial writes — order created, payment not decremented.
- Cascading table bloat: Aborted transactions leave dead tuples. Under deadlock storms,
autovacuumcan't keep up, leading to table bloat and index degradation.
Root cause pattern — the classic inversion:
Transaction A: UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- locks row 1
UPDATE accounts SET balance = balance + 100 WHERE id = 2; -- waits for row 2
Transaction B: UPDATE accounts SET balance = balance - 50 WHERE id = 2; -- locks row 2
UPDATE accounts SET balance = balance + 50 WHERE id = 1; -- waits for row 1 ← DEADLOCK
How to Fix It
Basic Fix — Enforce Consistent Lock Ordering
Always acquire locks on rows in the same deterministic order (e.g., ascending id) across all transactions.
-- Transaction handling a transfer between two accounts
BEGIN;
- -- Inconsistent order: Transaction A locks id=1 first, Transaction B locks id=2 first
- UPDATE accounts SET balance = balance - 100 WHERE id = :from_account;
- UPDATE accounts SET balance = balance + 100 WHERE id = :to_account;
+ -- Consistent order: always lock the lower ID first
+ UPDATE accounts SET balance = balance - 100
+ WHERE id = LEAST(:from_account, :to_account);
+ UPDATE accounts SET balance = balance + 100
+ WHERE id = GREATEST(:from_account, :to_account);
COMMIT;
For application-layer ordering:
- # Python — no ordering, deadlock-prone
- cursor.execute("UPDATE accounts SET balance = balance - %s WHERE id = %s", (amount, from_id))
- cursor.execute("UPDATE accounts SET balance = balance + %s WHERE id = %s", (amount, to_id))
+ # Python — sort IDs before locking
+ ids_in_order = sorted([from_id, to_id])
+ cursor.execute("SELECT id FROM accounts WHERE id = ANY(%s) ORDER BY id FOR UPDATE", (ids_in_order,))
+ cursor.execute("UPDATE accounts SET balance = balance - %s WHERE id = %s", (amount, from_id))
+ cursor.execute("UPDATE accounts SET balance = balance + %s WHERE id = %s", (amount, to_id))
Enterprise Best Practice — Optimistic Locking with Version Column
For high-read, low-contention workloads, eliminate pessimistic locking entirely. Use a version column — the UPDATE only succeeds if the row hasn't changed since you read it.
-- Schema change
+ ALTER TABLE orders ADD COLUMN version INTEGER NOT NULL DEFAULT 0;
-- Application UPDATE
- UPDATE orders
- SET status = 'processed', total = :new_total
- WHERE id = :order_id;
+ UPDATE orders
+ SET status = 'processed', total = :new_total, version = version + 1
+ WHERE id = :order_id
+ AND version = :expected_version; -- fails silently if row was modified concurrently
+
+ -- Check rows_affected in application code:
+ -- if rows_affected == 0: raise OptimisticLockException and retry with fresh read
Application-layer retry wrapper (Python):
- def update_order(order_id, new_total):
- db.execute("UPDATE orders SET total = %s WHERE id = %s", (new_total, order_id))
+ MAX_RETRIES = 5
+ def update_order(order_id, new_total):
+ for attempt in range(MAX_RETRIES):
+ row = db.fetchone("SELECT version FROM orders WHERE id = %s", (order_id,))
+ affected = db.execute(
+ "UPDATE orders SET total = %s, version = version + 1 "
+ "WHERE id = %s AND version = %s",
+ (new_total, order_id, row['version'])
+ )
+ if affected == 1:
+ return # success
+ time.sleep(0.05 * attempt) # exponential backoff
+ raise Exception("Optimistic lock failed after max retries")
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Detect lock ordering issues in code review with static analysis:
# .pre-commit-config.yaml — custom hook to flag unordered multi-row UPDATEs
- repo: local
hooks:
- id: check-lock-order
name: Flag potential deadlock patterns
entry: grep -rn "UPDATE.*WHERE id" --include="*.sql" --include="*.py"
language: system
pass_filenames: false
2. Set deadlock_timeout aggressively in staging to surface issues early:
- # postgresql.conf (default — too slow to catch in staging)
- deadlock_timeout = 1s
+ # postgresql.conf (staging only — surface deadlocks faster)
+ deadlock_timeout = 100ms
+ log_lock_waits = on
+ lock_timeout = 5s # fail fast rather than queue indefinitely
3. Integration test with concurrent workers:
# pytest-xdist parallel test — run the same transaction from 10 threads simultaneously
import concurrent.futures
def test_no_deadlock_under_concurrency():
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
futures = [executor.submit(update_order, order_id=1, new_total=i*10) for i in range(10)]
results = [f.result() for f in concurrent.futures.as_completed(futures)]
# Assert no exceptions were raised
4. Monitor in production — alert before the storm:
-- Query to detect current lock waits in pg_stat_activity
SELECT
pid,
wait_event_type,
wait_event,
state,
query
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY state_change;
Feed this into your Prometheus postgres_exporter and alert when pg_locks waiting count exceeds threshold. Deadlock storms are always preceded by a spike in lock waits — catch it there.