How to Fix PostgreSQL 'could not receive data from client: Resource temporarily unavailable' Error
Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins
TL;DR
- What broke: PostgreSQL received
EAGAINon a non-blocking client socket — the kernel had no data ready and the connection was abandoned. Root causes: exhaustedmax_connections, undersizedtcp_keepalives/ socket buffers, or a misconfigured connection pooler sending half-open connections. - How to fix it: Tune
max_connections+work_mem, enforce pgBouncer transaction pooling, and set OS-levelnet.core.somaxconn/tcp_keepalive_*sysctl values. - Fast path: Use our Client-Side Sandbox above to auto-refactor this — paste your
postgresql.conforpgbouncer.iniand get a corrected diff without sending your DB credentials anywhere.
The Incident (What Does the Error Mean?)
Raw log output:
2024-01-15 03:42:17 UTC [14832]: FATAL: could not receive data from client: Resource temporarily unavailable
2024-01-15 03:42:17 UTC [14832]: DETAIL: Client connection lost during query execution.
Resource temporarily unavailable is the human-readable form of errno 11 (EAGAIN). PostgreSQL uses non-blocking I/O on client sockets. When recv() is called and the kernel's socket receive buffer is empty — and the socket is set O_NONBLOCK — the syscall returns EAGAIN instead of blocking. PostgreSQL treats this as a fatal client disconnect and kills the backend process.
Immediate consequence: The query in flight is rolled back. The client application sees a broken pipe or connection reset. Under load, this cascades — every new connection attempt hits the same wall, and your application's connection pool starts throwing errors en masse within seconds.
The Attack Vector / Blast Radius
This is a availability failure, not a confidentiality breach — but the blast radius is severe:
Connection slot exhaustion (
max_connectionshit): PostgreSQL hard-caps connections. Each idle connection in your app pool holds a backend process (~5–10 MB RSS). At 200 connections on a 2 GB RDS instance, you're OOM before you hit the query. The kernel starts refusingaccept()calls, half-open TCP connections pile up in the backlog, andEAGAINfires on every read.pgBouncer misconfigured in session mode: Session pooling holds a server connection for the lifetime of a client connection. Under burst traffic, pgBouncer queues clients. Queued clients send a query, then the TCP keepalive timer expires before pgBouncer forwards it. PostgreSQL sees a socket with no data —
EAGAIN.Kernel socket buffer starvation:
net.core.rmem_maxandnet.core.rmem_defaultset too low on the host. Under high-throughput COPY or bulk INSERT, the receive buffer fills, the kernel signals the socket as temporarily unavailable, and PostgreSQL's non-blocking read fails.Cascading failure pattern:
EAGAIN→ backend process exits → connection slot freed → next queued connection claims it → same EAGAIN if root cause is buffer/keepalive-related → thundering herd. Your monitoring shows connection count oscillating wildly while throughput drops to zero.
How to Fix It (The Solution)
Fix 1: Tune postgresql.conf — Connection Limits and Keepalives
# postgresql.conf
- max_connections = 500
+ max_connections = 100
- work_mem = 4MB
+ work_mem = 16MB
- tcp_keepalives_idle = 0
+ tcp_keepalives_idle = 60
- tcp_keepalives_interval = 0
+ tcp_keepalives_interval = 10
- tcp_keepalives_count = 0
+ tcp_keepalives_count = 6
- tcp_user_timeout = 0
+ tcp_user_timeout = 60000
Why: Dropping max_connections forces you to use a pooler (correct architecture). tcp_keepalives_idle = 60 tells the kernel to probe idle connections after 60 seconds, detecting and reaping half-open connections before PostgreSQL wastes a backend process on them. tcp_user_timeout ensures a stalled send triggers a connection reset within 60 seconds instead of hanging indefinitely.
Fix 2: pgBouncer — Switch to Transaction Pooling
# pgbouncer.ini
[pgbouncer]
- pool_mode = session
+ pool_mode = transaction
- max_client_conn = 1000
+ max_client_conn = 5000
- default_pool_size = 20
+ default_pool_size = 25
- reserve_pool_size = 0
+ reserve_pool_size = 5
- reserve_pool_timeout = 5
+ reserve_pool_timeout = 3
- server_idle_timeout = 600
+ server_idle_timeout = 60
- client_idle_timeout = 0
+ client_idle_timeout = 30
- tcp_keepalive = 0
+ tcp_keepalive = 1
- tcp_keepidle = 0
+ tcp_keepidle = 60
- tcp_keepintvl = 0
+ tcp_keepintvl = 10
- tcp_keepcnt = 0
+ tcp_keepcnt = 6
Why: Transaction pooling releases the server connection back to the pool after each transaction commit/rollback. A 5000-client pgBouncer instance can multiplex onto 25 PostgreSQL backends. client_idle_timeout = 30 kills zombie client connections before they accumulate and exhaust pgBouncer's own file descriptor limit.
Fix 3: OS Kernel Tuning (Linux sysctl)
# /etc/sysctl.d/99-postgres.conf
- # net.core.somaxconn not set (default: 128)
+ net.core.somaxconn = 65535
- # net.ipv4.tcp_max_syn_backlog not set (default: 128)
+ net.ipv4.tcp_max_syn_backlog = 65535
- # net.core.rmem_max not set (default: 212992)
+ net.core.rmem_max = 134217728
- # net.core.wmem_max not set (default: 212992)
+ net.core.wmem_max = 134217728
- # net.ipv4.tcp_rmem not set
+ net.ipv4.tcp_rmem = 4096 87380 134217728
- # net.ipv4.tcp_wmem not set
+ net.ipv4.tcp_wmem = 4096 65536 134217728
- # net.ipv4.tcp_fin_timeout not set (default: 60)
+ net.ipv4.tcp_fin_timeout = 15
Apply immediately without reboot:
sysctl -p /etc/sysctl.d/99-postgres.conf
Why: somaxconn = 128 is a 1990s default. Under burst traffic, the TCP accept backlog fills in milliseconds. New connections get RST. Existing connections waiting in the backlog time out. PostgreSQL's recv() on those sockets returns EAGAIN. Raising this to 65535 and increasing socket buffer sizes gives the kernel room to absorb connection bursts.
Enterprise Best Practice: Connection Pooling Architecture
For production systems handling >50 req/s:
# Application connection string
- DATABASE_URL="postgresql://user:pass@postgres-host:5432/mydb?sslmode=require"
+ DATABASE_URL="postgresql://user:pass@pgbouncer-host:6432/mydb?sslmode=require&application_name=myapp"
# Application pool config (example: SQLAlchemy)
- pool_size=50
- max_overflow=100
- pool_timeout=30
- pool_recycle=-1
+ pool_size=10
+ max_overflow=20
+ pool_timeout=10
+ pool_recycle=1800
+ pool_pre_ping=True
pool_pre_ping=True issues a SELECT 1 before handing a connection to the application. Stale connections that would return EAGAIN are detected and replaced before they cause a query failure.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Checkov — Scan IaC for Missing Keepalive Config
If you provision RDS or Cloud SQL via Terraform, enforce keepalive parameters:
checkov -d . --check CKV_AWS_129,CKV_AWS_161
# CKV_AWS_129: Ensure RDS has deletion protection
# CKV_AWS_161: Ensure RDS uses IAM authentication
For custom checks, use a .checkov.yaml policy that flags any aws_db_parameter_group resource missing tcp_keepalives_idle.
2. OPA/Conftest — Enforce pgBouncer Pool Mode
# policy/pgbouncer.rego
package pgbouncer
deny[msg] {
input.pgbouncer.pool_mode == "session"
msg := "pgBouncer pool_mode must be 'transaction' or 'statement'. Session mode causes EAGAIN under burst load."
}
deny[msg] {
to_number(input.pgbouncer.client_idle_timeout) == 0
msg := "client_idle_timeout must be > 0. Zombie connections exhaust file descriptors and cause EAGAIN."
}
conftest test pgbouncer.ini --policy policy/
3. Load Test Gate in CI
Add a pgbench smoke test as a required CI step before deploying connection pool config changes:
# .github/workflows/db-load-test.yml
- name: pgbench connection stress test
run: |
pgbench -h $PGBOUNCER_HOST -p 6432 -U $PGUSER \
-c 200 -j 8 -T 30 \
--no-vacuum $PGDATABASE 2>&1 | tee pgbench.log
grep -E 'FATAL|EAGAIN|temporarily unavailable' pgbench.log && exit 1 || exit 0
This runs 200 concurrent clients for 30 seconds and fails the pipeline if any EAGAIN errors appear in the PostgreSQL log.
4. Alerting — Catch It Before It Cascades
# prometheus alert
- alert: PostgresEAGAINErrors
expr: rate(pg_stat_activity_count{state="idle in transaction"}[5m]) > 10
for: 2m
labels:
severity: critical
annotations:
summary: "High idle-in-transaction count — EAGAIN risk imminent"
description: "{{ $value }} connections idle in transaction. Check pgBouncer pool_mode and client_idle_timeout."