Fixing PostgreSQL 'no buffer space available' Error: Kernel Buffer Tuning & Connection Management Guide
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins depending on whether a kernel restart is required
TL;DR
- What broke: PostgreSQL's OS-level socket send/receive buffers (
SO_SNDBUF/SO_RCVBUF) are exhausted — the kernel has no buffer space left to queue outbound data to clients, so the postmaster kills the connection. - How to fix it: Tune
net.core.rmem_max,net.core.wmem_max, andnet.ipv4.tcp_memviasysctl; reducemax_connectionsin PostgreSQL and front it with pgBouncer in transaction-pooling mode. - CTA: Use our Client-Side Sandbox above to paste your
sysctl -aandpostgresql.confoutput — it will auto-generate the corrected diff locally without sending your config anywhere.
The Incident (What Does the Error Mean?)
Raw log output from postgresql.log:
LOG: could not receive data from client: no buffer space available
LOG: unexpected EOF on client connection with an open transaction
This is not a PostgreSQL bug. This is the Linux kernel telling the PostgreSQL backend process that recv() or send() on the client socket failed with ENOBUFS. The kernel socket buffer pool — shared across all sockets on the host — is depleted. PostgreSQL has no choice but to abort that backend. Any open transaction is rolled back. The client receives a hard disconnect.
Immediate consequence: In-flight transactions are lost. If this happens under peak load, it cascades — every new connection attempt may also fail, effectively taking the database offline from the application's perspective even though pg_isready may still return healthy.
The Attack Vector / Blast Radius
This is a resource exhaustion failure, not a security exploit, but the blast radius is severe:
Connection storm amplification: Applications using naive retry logic (no exponential backoff) detect the dropped connection and immediately reconnect. Each reconnect allocates a new socket buffer. This accelerates buffer exhaustion — a feedback loop that looks identical to a SYN flood from a monitoring perspective.
Transaction integrity risk: Any
BEGINblock that was mid-flight is rolled back silently from the server side. If the application does not correctly handleOperationalErrorand re-validate state, you get partial writes — data that the application believes was committed was not.Cascading service failure: Every microservice sharing this PostgreSQL host (common in under-architected setups) loses its connection pool simultaneously. A single buffer exhaustion event can take down unrelated services.
Root causes to rule out in order:
max_connectionsset too high (each idle connection holds a socket buffer allocation)- No connection pooler — raw application connections directly to Postgres
net.core.rmem_default/wmem_defaultat kernel defaults (too low for high-concurrency DB workloads)- Running Postgres on a shared host with other high-socket-count services (Redis, Kafka, app servers)
tcp_mempressure threshold breached — checkcat /proc/net/sockstat
How to Fix It
Step 0: Confirm the Diagnosis
# Check current socket buffer pressure
cat /proc/net/sockstat
# Look for: sockets: used <N> — if this is in the tens of thousands, you have your answer
# Check kernel TCP memory pressure
cat /proc/net/sockstat | grep TCP
# Check current buffer limits
sysctl net.core.rmem_max net.core.wmem_max net.ipv4.tcp_mem
# Check Postgres connection count vs max_connections
psql -U postgres -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"
Basic Fix — Kernel Buffer Tuning
# /etc/sysctl.conf
- net.core.rmem_default = 212992
- net.core.wmem_default = 212992
- net.core.rmem_max = 212992
- net.core.wmem_max = 212992
- net.ipv4.tcp_mem = 188562 251418 377124
+ net.core.rmem_default = 31457280
+ net.core.wmem_default = 31457280
+ net.core.rmem_max = 134217728
+ net.core.wmem_max = 134217728
+ net.ipv4.tcp_mem = 786432 1048576 26777216
+ net.core.somaxconn = 65535
+ net.ipv4.tcp_max_syn_backlog = 65535
Apply without reboot:
sysctl -p /etc/sysctl.conf
Enterprise Best Practice — Connection Pooling + PostgreSQL Hardening
Kernel tuning alone is a band-aid if max_connections = 500 with no pooler. The correct architecture is: Application → pgBouncer (transaction mode) → PostgreSQL (max_connections = 100–200).
# postgresql.conf
- max_connections = 500
+ max_connections = 150
# Fewer backends = fewer socket buffers held open permanently
# pgBouncer handles the fan-out
- work_mem = 4MB
+ work_mem = 16MB
# With fewer connections, you can afford more work_mem per query
# pgbouncer.ini
- pool_mode = session
+ pool_mode = transaction
# Transaction pooling recycles connections aggressively
# A server connection is only held for the duration of a transaction, not the client session
- max_client_conn = 100
+ max_client_conn = 5000
# pgBouncer holds these as lightweight userspace objects, not kernel sockets to Postgres
+ server_idle_timeout = 600
+ client_idle_timeout = 0
+ tcp_keepalive = 1
+ tcp_keepidle = 60
+ tcp_keepintvl = 10
+ tcp_keepcnt = 5
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Terraform / Ansible: Enforce sysctl at provisioning time
# Ansible role: roles/postgres_host/tasks/kernel.yml
- # No sysctl tuning — relying on OS defaults
+ - name: Set PostgreSQL host kernel buffer parameters
+ sysctl:
+ name: "{{ item.key }}"
+ value: "{{ item.value }}"
+ state: present
+ reload: yes
+ loop:
+ - { key: 'net.core.rmem_max', value: '134217728' }
+ - { key: 'net.core.wmem_max', value: '134217728' }
+ - { key: 'net.ipv4.tcp_mem', value: '786432 1048576 26777216' }
2. OPA / Conftest Policy: Block high max_connections without a pooler annotation
# policy/postgres_connections.rego
package postgres
deny[msg] {
input.postgresql_conf.max_connections > 300
not input.annotations["pooler-enabled"] == "true"
msg := sprintf("max_connections=%d exceeds 300 without a declared connection pooler. Risk: ENOBUFS exhaustion.", [input.postgresql_conf.max_connections])
}
3. Monitoring: Alert before exhaustion, not after
# Prometheus alerting rule
- alert: PostgresSocketBufferPressure
expr: node_sockstat_sockets_used > 50000
for: 2m
labels:
severity: warning
annotations:
summary: "Host socket count approaching buffer exhaustion threshold"
runbook: "https://wiki.internal/runbooks/postgres-enobufs"
- alert: PostgresConnectionSaturation
expr: pg_stat_activity_count / pg_settings_max_connections > 0.85
for: 1m
labels:
severity: critical
4. Load test gate in CI
Run pgbench at 2x expected peak concurrency in your staging pipeline. If could not receive data from client appears in the Postgres logs during the load test, fail the pipeline. Do not let this reach production.
# In CI pipeline (GitHub Actions, GitLab CI, etc.)
pgbench -U postgres -c 200 -j 8 -T 60 mydb 2>&1 | tee pgbench.log
grep -i "no buffer space" pgbench.log && echo "BUFFER EXHAUSTION DETECTED — FAILING BUILD" && exit 1