Initializing Enclave...

How to Fix PostgreSQL 'work_mem Exceeded Limit: Sort Method Disk Hash Join' and Stop Queries Spilling to Disk

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 15 mins

TL;DR

  • What broke: work_mem is too low for the sort or hash join operation — PostgreSQL wrote intermediate results to disk (Disk: XXXkB in EXPLAIN ANALYZE), causing 10–1000x query slowdowns and I/O spikes.
  • How to fix it: Increase work_mem at the session, role, or global level to keep the operation in RAM; or rewrite the query to use an index that eliminates the sort entirely.
  • Shortcut: Use our Client-Side Sandbox above to paste your EXPLAIN ANALYZE output and auto-generate the corrected work_mem value and query refactor.

The Incident (What Does the Error Mean?)

Raw log output:

LOG: work_mem exceeded limit: sort method disk
LOG: work_mem exceeded limit: hash join

Or in EXPLAIN (ANALYZE, BUFFERS) output:

Sort  (cost=184772.56..187272.56 rows=1000000 width=128)
      Sort Method: external merge  Disk: 142384kB
  ->  Hash Join  (cost=...) (actual rows=...)
        Buckets: 1024  Batches: 32  Memory Usage: 4096kB

Batches: 32 means PostgreSQL split the hash join into 32 disk-based passes. Sort Method: external merge means the sort never fit in RAM. Every disk pass is synchronous I/O. On a shared-storage cloud instance (EBS gp2, Azure Premium SSD), this will saturate IOPS and cause queue depth blowout for every concurrent query hitting the same tablespace.


The Attack Vector / Blast Radius

This is not an isolated slow query. The cascading failure path:

  1. One undersized analytical query triggers disk sort → I/O wait spikes on the data volume.
  2. Concurrent OLTP queries (INSERTs, simple SELECTs) begin waiting on I/O — pg_stat_activity fills with wait_event_type: IO, wait_event: DataFileRead.
  3. Connection pool exhaustion — queries pile up, PgBouncer or application pool hits max connections.
  4. Full application outage — your web tier starts returning 500s or timeouts while PostgreSQL is doing nothing but disk thrashing.

On RDS/Aurora, this also burns burst IOPS credits (gp2 volumes). Once credits are exhausted, you're throttled to baseline — typically 100 IOPS for a 33GB volume. A single bad query can cost every other tenant on your application their performance budget.

work_mem is per-operation, per-query. A query with 3 sort nodes and 2 hash joins can consume 5 × work_mem simultaneously. Setting work_mem = 256MB globally on a 32-core server with max_connections = 200 means theoretical peak RAM consumption: 200 × 5 × 256MB = 256GB. Size this with that math in mind.


How to Fix It

Step 1: Confirm the Spill

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT /* your slow query here */;

Look for:

  • Sort Method: external merge Disk: NNNkB → sort spill
  • Batches: N where N > 1 → hash join spill
  • Buffers: temp read=NNN written=NNN → disk temp usage

Basic Fix — Session-Level

Set work_mem for the specific session or transaction before running the offending query:

- -- No session override; using global default of 4MB
- SET work_mem = '4MB';
+ -- Size to fit the operation; check 'Disk: NNNkB' in EXPLAIN and add 20% headroom
+ SET work_mem = '256MB';

For a specific role running reporting queries:

- -- Role has no override, inherits 4MB global
+ ALTER ROLE reporting_user SET work_mem = '128MB';

Enterprise Best Practice — postgresql.conf + Per-Query Tuning

Global baseline (keep this conservative — see blast radius math above):

# postgresql.conf
- work_mem = 4MB
+ work_mem = 16MB          # Safe global baseline for OLTP

Application-level override for known heavy queries (connection pool middleware or query preamble):

- -- No preamble; query uses global 16MB
+ SET LOCAL work_mem = '512MB';  -- Inside BEGIN/COMMIT block, reverts after transaction
  SELECT
    customer_id,
    SUM(order_total)
  FROM orders
  GROUP BY customer_id
  ORDER BY SUM(order_total) DESC;

Eliminate the sort entirely with an index (the real enterprise fix):

- -- Full sequential scan + sort on unindexed column
- SELECT * FROM events ORDER BY created_at DESC LIMIT 100;
+ -- Create index to satisfy ORDER BY without a sort node
+ CREATE INDEX CONCURRENTLY idx_events_created_at_desc ON events (created_at DESC);
+ SELECT * FROM events ORDER BY created_at DESC LIMIT 100;

For hash joins — if the inner relation is repeatedly large, consider a merge join with sorted input or a materialized CTE:

- SELECT a.*, b.* FROM large_table a JOIN another_large b ON a.id = b.fk_id;
+ -- Force merge join if both sides can be index-scanned in order
+ SET enable_hashjoin = off;
+ SET enable_mergejoin = on;
+ SELECT a.*, b.* FROM large_table a JOIN another_large b ON a.id = b.fk_id;

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Auto-EXPLAIN in Staging

Enable auto_explain to catch disk spills before they reach production:

# postgresql.conf (staging)
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = '1s'
auto_explain.log_analyze = on
auto_explain.log_buffers = on

Pipe postgresql.log through a grep in your CI pipeline:

# Fail CI if any explain plan shows disk sort or hash batches > 1
grep -E "(external merge.*Disk:|Batches: [2-9][0-9]*)" /var/log/postgresql/postgresql.log \
  && echo "DISK SPILL DETECTED — failing build" && exit 1

2. pgBadger / Query Regression Gate

Run pgBadger against staging query logs as a CI step. Set a threshold:

pgbadger /var/log/postgresql/*.log \
  --outfile report.html \
  --top 20 \
  --tempfile-threshold 10MB   # Flag any query writing >10MB to temp

Fail the pipeline if tempfile_threshold is breached by any query in the test suite.

3. Terraform / Ansible Guard for RDS Parameter Groups

If managing RDS parameter groups via Terraform, enforce a minimum work_mem:

 resource "aws_db_parameter_group" "postgres" {
   name   = "prod-pg15"
   family = "postgres15"

-  parameter {
-    name  = "work_mem"
-    value = "4096"  # 4MB — default, causes disk spills on any join
-  }
+  parameter {
+    name         = "work_mem"
+    value        = "16384"  # 16MB baseline; override per-role for analytics
+    apply_method = "immediate"
+  }
 }

4. Checkov Policy (IaC Scan)

Add a custom Checkov check to your repo to reject any RDS parameter group with work_mem < 8192:

# checkov/custom_checks/check_work_mem.py
from checkov.terraform.checks.resource.base_resource_check import BaseResourceCheck
from checkov.common.models.enums import CheckResult, CheckCategories

class WorkMemCheck(BaseResourceCheck):
    def __init__(self):
        super().__init__(
            name="Ensure work_mem is not below 8MB",
            id="CKV_CUSTOM_PG_001",
            categories=[CheckCategories.GENERAL_SECURITY],
            supported_resources=["aws_db_parameter_group"]
        )

    def scan_resource_conf(self, conf):
        params = conf.get("parameter", [])
        for p in params:
            if p.get("name") == ["work_mem"]:
                if int(p["value"][0]) < 8192:
                    return CheckResult.FAILED
        return CheckResult.PASSED

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →