How to Fix PostgreSQL 'work_mem Exceeded Limit: Sort Method Disk Hash Join' and Stop Queries Spilling to Disk
Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 15 mins
TL;DR
- What broke:
work_memis too low for the sort or hash join operation — PostgreSQL wrote intermediate results to disk (Disk: XXXkBin EXPLAIN ANALYZE), causing 10–1000x query slowdowns and I/O spikes. - How to fix it: Increase
work_memat the session, role, or global level to keep the operation in RAM; or rewrite the query to use an index that eliminates the sort entirely. - Shortcut: Use our Client-Side Sandbox above to paste your
EXPLAIN ANALYZEoutput and auto-generate the correctedwork_memvalue and query refactor.
The Incident (What Does the Error Mean?)
Raw log output:
LOG: work_mem exceeded limit: sort method disk
LOG: work_mem exceeded limit: hash join
Or in EXPLAIN (ANALYZE, BUFFERS) output:
Sort (cost=184772.56..187272.56 rows=1000000 width=128)
Sort Method: external merge Disk: 142384kB
-> Hash Join (cost=...) (actual rows=...)
Buckets: 1024 Batches: 32 Memory Usage: 4096kB
Batches: 32 means PostgreSQL split the hash join into 32 disk-based passes. Sort Method: external merge means the sort never fit in RAM. Every disk pass is synchronous I/O. On a shared-storage cloud instance (EBS gp2, Azure Premium SSD), this will saturate IOPS and cause queue depth blowout for every concurrent query hitting the same tablespace.
The Attack Vector / Blast Radius
This is not an isolated slow query. The cascading failure path:
- One undersized analytical query triggers disk sort → I/O wait spikes on the data volume.
- Concurrent OLTP queries (INSERTs, simple SELECTs) begin waiting on I/O —
pg_stat_activityfills withwait_event_type: IO,wait_event: DataFileRead. - Connection pool exhaustion — queries pile up, PgBouncer or application pool hits max connections.
- Full application outage — your web tier starts returning 500s or timeouts while PostgreSQL is doing nothing but disk thrashing.
On RDS/Aurora, this also burns burst IOPS credits (gp2 volumes). Once credits are exhausted, you're throttled to baseline — typically 100 IOPS for a 33GB volume. A single bad query can cost every other tenant on your application their performance budget.
work_mem is per-operation, per-query. A query with 3 sort nodes and 2 hash joins can consume 5 × work_mem simultaneously. Setting work_mem = 256MB globally on a 32-core server with max_connections = 200 means theoretical peak RAM consumption: 200 × 5 × 256MB = 256GB. Size this with that math in mind.
How to Fix It
Step 1: Confirm the Spill
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT /* your slow query here */;
Look for:
Sort Method: external merge Disk: NNNkB→ sort spillBatches: Nwhere N > 1 → hash join spillBuffers: temp read=NNN written=NNN→ disk temp usage
Basic Fix — Session-Level
Set work_mem for the specific session or transaction before running the offending query:
- -- No session override; using global default of 4MB
- SET work_mem = '4MB';
+ -- Size to fit the operation; check 'Disk: NNNkB' in EXPLAIN and add 20% headroom
+ SET work_mem = '256MB';
For a specific role running reporting queries:
- -- Role has no override, inherits 4MB global
+ ALTER ROLE reporting_user SET work_mem = '128MB';
Enterprise Best Practice — postgresql.conf + Per-Query Tuning
Global baseline (keep this conservative — see blast radius math above):
# postgresql.conf
- work_mem = 4MB
+ work_mem = 16MB # Safe global baseline for OLTP
Application-level override for known heavy queries (connection pool middleware or query preamble):
- -- No preamble; query uses global 16MB
+ SET LOCAL work_mem = '512MB'; -- Inside BEGIN/COMMIT block, reverts after transaction
SELECT
customer_id,
SUM(order_total)
FROM orders
GROUP BY customer_id
ORDER BY SUM(order_total) DESC;
Eliminate the sort entirely with an index (the real enterprise fix):
- -- Full sequential scan + sort on unindexed column
- SELECT * FROM events ORDER BY created_at DESC LIMIT 100;
+ -- Create index to satisfy ORDER BY without a sort node
+ CREATE INDEX CONCURRENTLY idx_events_created_at_desc ON events (created_at DESC);
+ SELECT * FROM events ORDER BY created_at DESC LIMIT 100;
For hash joins — if the inner relation is repeatedly large, consider a merge join with sorted input or a materialized CTE:
- SELECT a.*, b.* FROM large_table a JOIN another_large b ON a.id = b.fk_id;
+ -- Force merge join if both sides can be index-scanned in order
+ SET enable_hashjoin = off;
+ SET enable_mergejoin = on;
+ SELECT a.*, b.* FROM large_table a JOIN another_large b ON a.id = b.fk_id;
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Auto-EXPLAIN in Staging
Enable auto_explain to catch disk spills before they reach production:
# postgresql.conf (staging)
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = '1s'
auto_explain.log_analyze = on
auto_explain.log_buffers = on
Pipe postgresql.log through a grep in your CI pipeline:
# Fail CI if any explain plan shows disk sort or hash batches > 1
grep -E "(external merge.*Disk:|Batches: [2-9][0-9]*)" /var/log/postgresql/postgresql.log \
&& echo "DISK SPILL DETECTED — failing build" && exit 1
2. pgBadger / Query Regression Gate
Run pgBadger against staging query logs as a CI step. Set a threshold:
pgbadger /var/log/postgresql/*.log \
--outfile report.html \
--top 20 \
--tempfile-threshold 10MB # Flag any query writing >10MB to temp
Fail the pipeline if tempfile_threshold is breached by any query in the test suite.
3. Terraform / Ansible Guard for RDS Parameter Groups
If managing RDS parameter groups via Terraform, enforce a minimum work_mem:
resource "aws_db_parameter_group" "postgres" {
name = "prod-pg15"
family = "postgres15"
- parameter {
- name = "work_mem"
- value = "4096" # 4MB — default, causes disk spills on any join
- }
+ parameter {
+ name = "work_mem"
+ value = "16384" # 16MB baseline; override per-role for analytics
+ apply_method = "immediate"
+ }
}
4. Checkov Policy (IaC Scan)
Add a custom Checkov check to your repo to reject any RDS parameter group with work_mem < 8192:
# checkov/custom_checks/check_work_mem.py
from checkov.terraform.checks.resource.base_resource_check import BaseResourceCheck
from checkov.common.models.enums import CheckResult, CheckCategories
class WorkMemCheck(BaseResourceCheck):
def __init__(self):
super().__init__(
name="Ensure work_mem is not below 8MB",
id="CKV_CUSTOM_PG_001",
categories=[CheckCategories.GENERAL_SECURITY],
supported_resources=["aws_db_parameter_group"]
)
def scan_resource_conf(self, conf):
params = conf.get("parameter", [])
for p in params:
if p.get("name") == ["work_mem"]:
if int(p["value"][0]) < 8192:
return CheckResult.FAILED
return CheckResult.PASSED