Why does 'no buffer space available' only appear under high load and not during normal operation?

Under normal load, the kernel socket buffer pool has headroom. Each PostgreSQL backend holds open a socket to its client, consuming a portion of the kernel's buffer allocation. When concurrent connections spike — due to a traffic surge, a connection leak, or a retry storm — the aggregate buffer demand exceeds the kernel's configured limits (net.core.rmem_max, net.ipv4.tcp_mem). The ENOBUFS error only surfaces at that exhaustion threshold, which is why it appears intermittent until load is consistently high enough to breach it.

Will increasing net.core.rmem_max permanently fix this, or is it just a band-aid?

It buys time, but it is not a permanent fix if the root cause is too many direct connections to PostgreSQL. Kernel buffer space is finite. If max_connections is 500 and all 500 are active, you will eventually exhaust buffers again at a higher load threshold. The durable fix is to reduce max_connections to 100–200 and deploy pgBouncer in transaction-pooling mode in front of PostgreSQL. This limits the number of actual kernel sockets to the database while allowing thousands of application-side logical connections.

How do I tell if pgBouncer itself is contributing to the buffer exhaustion?

Run 'cat /proc/net/sockstat' and check the 'sockets: used' count on the pgBouncer host specifically. Also check 'SHOW POOLS;' in the pgBouncer admin console — look for sv_idle and sv_active counts against your PostgreSQL server. If pgBouncer is in session mode rather than transaction mode, it holds one server-side socket open per client connection, which defeats the purpose. Switch to pool_mode = transaction in pgbouncer.ini and verify that server connection count to Postgres drops significantly.

Fixing PostgreSQL 'no buffer space available' Error: Kernel Buffer Tuning & Connection Management Guide

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins depending on whether a kernel restart is required

TL;DR

What broke: PostgreSQL's OS-level socket send/receive buffers (SO_SNDBUF/SO_RCVBUF) are exhausted — the kernel has no buffer space left to queue outbound data to clients, so the postmaster kills the connection.
How to fix it: Tune net.core.rmem_max, net.core.wmem_max, and net.ipv4.tcp_mem via sysctl; reduce max_connections in PostgreSQL and front it with pgBouncer in transaction-pooling mode.
CTA: Use our Client-Side Sandbox above to paste your sysctl -a and postgresql.conf output — it will auto-generate the corrected diff locally without sending your config anywhere.

The Incident (What Does the Error Mean?)

Raw log output from postgresql.log:

LOG:  could not receive data from client: no buffer space available
LOG:  unexpected EOF on client connection with an open transaction

This is not a PostgreSQL bug. This is the Linux kernel telling the PostgreSQL backend process that recv() or send() on the client socket failed with ENOBUFS. The kernel socket buffer pool — shared across all sockets on the host — is depleted. PostgreSQL has no choice but to abort that backend. Any open transaction is rolled back. The client receives a hard disconnect.

Immediate consequence: In-flight transactions are lost. If this happens under peak load, it cascades — every new connection attempt may also fail, effectively taking the database offline from the application's perspective even though pg_isready may still return healthy.

The Attack Vector / Blast Radius

This is a resource exhaustion failure, not a security exploit, but the blast radius is severe:

Connection storm amplification: Applications using naive retry logic (no exponential backoff) detect the dropped connection and immediately reconnect. Each reconnect allocates a new socket buffer. This accelerates buffer exhaustion — a feedback loop that looks identical to a SYN flood from a monitoring perspective.
Transaction integrity risk: Any BEGIN block that was mid-flight is rolled back silently from the server side. If the application does not correctly handle OperationalError and re-validate state, you get partial writes — data that the application believes was committed was not.
Cascading service failure: Every microservice sharing this PostgreSQL host (common in under-architected setups) loses its connection pool simultaneously. A single buffer exhaustion event can take down unrelated services.
Root causes to rule out in order:
- max_connections set too high (each idle connection holds a socket buffer allocation)
- No connection pooler — raw application connections directly to Postgres
- net.core.rmem_default / wmem_default at kernel defaults (too low for high-concurrency DB workloads)
- Running Postgres on a shared host with other high-socket-count services (Redis, Kafka, app servers)
- tcp_mem pressure threshold breached — check cat /proc/net/sockstat

How to Fix It

Step 0: Confirm the Diagnosis

# Check current socket buffer pressure
cat /proc/net/sockstat
# Look for: sockets: used <N> — if this is in the tens of thousands, you have your answer

# Check kernel TCP memory pressure
cat /proc/net/sockstat | grep TCP

# Check current buffer limits
sysctl net.core.rmem_max net.core.wmem_max net.ipv4.tcp_mem

# Check Postgres connection count vs max_connections
psql -U postgres -c "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"

Basic Fix — Kernel Buffer Tuning

# /etc/sysctl.conf

- net.core.rmem_default = 212992
- net.core.wmem_default = 212992
- net.core.rmem_max = 212992
- net.core.wmem_max = 212992
- net.ipv4.tcp_mem = 188562 251418 377124
+ net.core.rmem_default = 31457280
+ net.core.wmem_default = 31457280
+ net.core.rmem_max = 134217728
+ net.core.wmem_max = 134217728
+ net.ipv4.tcp_mem = 786432 1048576 26777216
+ net.core.somaxconn = 65535
+ net.ipv4.tcp_max_syn_backlog = 65535

Apply without reboot:

sysctl -p /etc/sysctl.conf

Enterprise Best Practice — Connection Pooling + PostgreSQL Hardening

Kernel tuning alone is a band-aid if max_connections = 500 with no pooler. The correct architecture is: Application → pgBouncer (transaction mode) → PostgreSQL (max_connections = 100–200).

# postgresql.conf

- max_connections = 500
+ max_connections = 150
# Fewer backends = fewer socket buffers held open permanently
# pgBouncer handles the fan-out

- work_mem = 4MB
+ work_mem = 16MB
# With fewer connections, you can afford more work_mem per query

# pgbouncer.ini

- pool_mode = session
+ pool_mode = transaction
# Transaction pooling recycles connections aggressively
# A server connection is only held for the duration of a transaction, not the client session

- max_client_conn = 100
+ max_client_conn = 5000
# pgBouncer holds these as lightweight userspace objects, not kernel sockets to Postgres

+ server_idle_timeout = 600
+ client_idle_timeout = 0
+ tcp_keepalive = 1
+ tcp_keepidle = 60
+ tcp_keepintvl = 10
+ tcp_keepcnt = 5

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Terraform / Ansible: Enforce sysctl at provisioning time

# Ansible role: roles/postgres_host/tasks/kernel.yml

- # No sysctl tuning — relying on OS defaults
+ - name: Set PostgreSQL host kernel buffer parameters
+   sysctl:
+     name: "{{ item.key }}"
+     value: "{{ item.value }}"
+     state: present
+     reload: yes
+   loop:
+     - { key: 'net.core.rmem_max', value: '134217728' }
+     - { key: 'net.core.wmem_max', value: '134217728' }
+     - { key: 'net.ipv4.tcp_mem', value: '786432 1048576 26777216' }

2. OPA / Conftest Policy: Block high `max_connections` without a pooler annotation

# policy/postgres_connections.rego
package postgres

deny[msg] {
  input.postgresql_conf.max_connections > 300
  not input.annotations["pooler-enabled"] == "true"
  msg := sprintf("max_connections=%d exceeds 300 without a declared connection pooler. Risk: ENOBUFS exhaustion.", [input.postgresql_conf.max_connections])
}

3. Monitoring: Alert before exhaustion, not after

# Prometheus alerting rule
- alert: PostgresSocketBufferPressure
  expr: node_sockstat_sockets_used > 50000
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "Host socket count approaching buffer exhaustion threshold"
    runbook: "https://wiki.internal/runbooks/postgres-enobufs"

- alert: PostgresConnectionSaturation
  expr: pg_stat_activity_count / pg_settings_max_connections > 0.85
  for: 1m
  labels:
    severity: critical

4. Load test gate in CI

Run pgbench at 2x expected peak concurrency in your staging pipeline. If could not receive data from client appears in the Postgres logs during the load test, fail the pipeline. Do not let this reach production.

# In CI pipeline (GitHub Actions, GitLab CI, etc.)
pgbench -U postgres -c 200 -j 8 -T 60 mydb 2>&1 | tee pgbench.log
grep -i "no buffer space" pgbench.log && echo "BUFFER EXHAUSTION DETECTED — FAILING BUILD" && exit 1