Initializing Enclave...

How to Fix PostgreSQL 'could not receive data from client: Resource temporarily unavailable' Error

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins


TL;DR

  • What broke: PostgreSQL received EAGAIN on a non-blocking client socket — the kernel had no data ready and the connection was abandoned. Root causes: exhausted max_connections, undersized tcp_keepalives / socket buffers, or a misconfigured connection pooler sending half-open connections.
  • How to fix it: Tune max_connections + work_mem, enforce pgBouncer transaction pooling, and set OS-level net.core.somaxconn / tcp_keepalive_* sysctl values.
  • Fast path: Use our Client-Side Sandbox above to auto-refactor this — paste your postgresql.conf or pgbouncer.ini and get a corrected diff without sending your DB credentials anywhere.

The Incident (What Does the Error Mean?)

Raw log output:

2024-01-15 03:42:17 UTC [14832]: FATAL: could not receive data from client: Resource temporarily unavailable
2024-01-15 03:42:17 UTC [14832]: DETAIL: Client connection lost during query execution.

Resource temporarily unavailable is the human-readable form of errno 11 (EAGAIN). PostgreSQL uses non-blocking I/O on client sockets. When recv() is called and the kernel's socket receive buffer is empty — and the socket is set O_NONBLOCK — the syscall returns EAGAIN instead of blocking. PostgreSQL treats this as a fatal client disconnect and kills the backend process.

Immediate consequence: The query in flight is rolled back. The client application sees a broken pipe or connection reset. Under load, this cascades — every new connection attempt hits the same wall, and your application's connection pool starts throwing errors en masse within seconds.


The Attack Vector / Blast Radius

This is a availability failure, not a confidentiality breach — but the blast radius is severe:

  1. Connection slot exhaustion (max_connections hit): PostgreSQL hard-caps connections. Each idle connection in your app pool holds a backend process (~5–10 MB RSS). At 200 connections on a 2 GB RDS instance, you're OOM before you hit the query. The kernel starts refusing accept() calls, half-open TCP connections pile up in the backlog, and EAGAIN fires on every read.

  2. pgBouncer misconfigured in session mode: Session pooling holds a server connection for the lifetime of a client connection. Under burst traffic, pgBouncer queues clients. Queued clients send a query, then the TCP keepalive timer expires before pgBouncer forwards it. PostgreSQL sees a socket with no data — EAGAIN.

  3. Kernel socket buffer starvation: net.core.rmem_max and net.core.rmem_default set too low on the host. Under high-throughput COPY or bulk INSERT, the receive buffer fills, the kernel signals the socket as temporarily unavailable, and PostgreSQL's non-blocking read fails.

  4. Cascading failure pattern: EAGAIN → backend process exits → connection slot freed → next queued connection claims it → same EAGAIN if root cause is buffer/keepalive-related → thundering herd. Your monitoring shows connection count oscillating wildly while throughput drops to zero.


How to Fix It (The Solution)

Fix 1: Tune postgresql.conf — Connection Limits and Keepalives

# postgresql.conf

- max_connections = 500
+ max_connections = 100

- work_mem = 4MB
+ work_mem = 16MB

- tcp_keepalives_idle = 0
+ tcp_keepalives_idle = 60

- tcp_keepalives_interval = 0
+ tcp_keepalives_interval = 10

- tcp_keepalives_count = 0
+ tcp_keepalives_count = 6

- tcp_user_timeout = 0
+ tcp_user_timeout = 60000

Why: Dropping max_connections forces you to use a pooler (correct architecture). tcp_keepalives_idle = 60 tells the kernel to probe idle connections after 60 seconds, detecting and reaping half-open connections before PostgreSQL wastes a backend process on them. tcp_user_timeout ensures a stalled send triggers a connection reset within 60 seconds instead of hanging indefinitely.


Fix 2: pgBouncer — Switch to Transaction Pooling

# pgbouncer.ini

 [pgbouncer]
- pool_mode = session
+ pool_mode = transaction

- max_client_conn = 1000
+ max_client_conn = 5000

- default_pool_size = 20
+ default_pool_size = 25

- reserve_pool_size = 0
+ reserve_pool_size = 5

- reserve_pool_timeout = 5
+ reserve_pool_timeout = 3

- server_idle_timeout = 600
+ server_idle_timeout = 60

- client_idle_timeout = 0
+ client_idle_timeout = 30

- tcp_keepalive = 0
+ tcp_keepalive = 1

- tcp_keepidle = 0
+ tcp_keepidle = 60

- tcp_keepintvl = 0
+ tcp_keepintvl = 10

- tcp_keepcnt = 0
+ tcp_keepcnt = 6

Why: Transaction pooling releases the server connection back to the pool after each transaction commit/rollback. A 5000-client pgBouncer instance can multiplex onto 25 PostgreSQL backends. client_idle_timeout = 30 kills zombie client connections before they accumulate and exhaust pgBouncer's own file descriptor limit.


Fix 3: OS Kernel Tuning (Linux sysctl)

# /etc/sysctl.d/99-postgres.conf

- # net.core.somaxconn not set (default: 128)
+ net.core.somaxconn = 65535

- # net.ipv4.tcp_max_syn_backlog not set (default: 128)
+ net.ipv4.tcp_max_syn_backlog = 65535

- # net.core.rmem_max not set (default: 212992)
+ net.core.rmem_max = 134217728

- # net.core.wmem_max not set (default: 212992)
+ net.core.wmem_max = 134217728

- # net.ipv4.tcp_rmem not set
+ net.ipv4.tcp_rmem = 4096 87380 134217728

- # net.ipv4.tcp_wmem not set
+ net.ipv4.tcp_wmem = 4096 65536 134217728

- # net.ipv4.tcp_fin_timeout not set (default: 60)
+ net.ipv4.tcp_fin_timeout = 15

Apply immediately without reboot:

sysctl -p /etc/sysctl.d/99-postgres.conf

Why: somaxconn = 128 is a 1990s default. Under burst traffic, the TCP accept backlog fills in milliseconds. New connections get RST. Existing connections waiting in the backlog time out. PostgreSQL's recv() on those sockets returns EAGAIN. Raising this to 65535 and increasing socket buffer sizes gives the kernel room to absorb connection bursts.


Enterprise Best Practice: Connection Pooling Architecture

For production systems handling >50 req/s:

# Application connection string

- DATABASE_URL="postgresql://user:pass@postgres-host:5432/mydb?sslmode=require"
+ DATABASE_URL="postgresql://user:pass@pgbouncer-host:6432/mydb?sslmode=require&application_name=myapp"

# Application pool config (example: SQLAlchemy)
- pool_size=50
- max_overflow=100
- pool_timeout=30
- pool_recycle=-1
+ pool_size=10
+ max_overflow=20
+ pool_timeout=10
+ pool_recycle=1800
+ pool_pre_ping=True

pool_pre_ping=True issues a SELECT 1 before handing a connection to the application. Stale connections that would return EAGAIN are detected and replaced before they cause a query failure.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Checkov — Scan IaC for Missing Keepalive Config

If you provision RDS or Cloud SQL via Terraform, enforce keepalive parameters:

checkov -d . --check CKV_AWS_129,CKV_AWS_161
# CKV_AWS_129: Ensure RDS has deletion protection
# CKV_AWS_161: Ensure RDS uses IAM authentication

For custom checks, use a .checkov.yaml policy that flags any aws_db_parameter_group resource missing tcp_keepalives_idle.

2. OPA/Conftest — Enforce pgBouncer Pool Mode

# policy/pgbouncer.rego
package pgbouncer

deny[msg] {
  input.pgbouncer.pool_mode == "session"
  msg := "pgBouncer pool_mode must be 'transaction' or 'statement'. Session mode causes EAGAIN under burst load."
}

deny[msg] {
  to_number(input.pgbouncer.client_idle_timeout) == 0
  msg := "client_idle_timeout must be > 0. Zombie connections exhaust file descriptors and cause EAGAIN."
}
conftest test pgbouncer.ini --policy policy/

3. Load Test Gate in CI

Add a pgbench smoke test as a required CI step before deploying connection pool config changes:

# .github/workflows/db-load-test.yml
- name: pgbench connection stress test
  run: |
    pgbench -h $PGBOUNCER_HOST -p 6432 -U $PGUSER \
      -c 200 -j 8 -T 30 \
      --no-vacuum $PGDATABASE 2>&1 | tee pgbench.log
    grep -E 'FATAL|EAGAIN|temporarily unavailable' pgbench.log && exit 1 || exit 0

This runs 200 concurrent clients for 30 seconds and fails the pipeline if any EAGAIN errors appear in the PostgreSQL log.

4. Alerting — Catch It Before It Cascades

# prometheus alert
- alert: PostgresEAGAINErrors
  expr: rate(pg_stat_activity_count{state="idle in transaction"}[5m]) > 10
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "High idle-in-transaction count — EAGAIN risk imminent"
    description: "{{ $value }} connections idle in transaction. Check pgBouncer pool_mode and client_idle_timeout."

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →