How to Fix PostgreSQL 'Out of File Descriptors' Error: Tuning ulimit and max_files_per_process
Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 15 mins
TL;DR
- What broke: PostgreSQL exhausted the OS file descriptor limit. Every new connection, WAL segment open, and table file access requires an FD. When the process hits its ceiling, it cannot open any file — queries die, connections are refused, the cluster goes read-unavailable.
- How to fix it: Raise
LimitNOFILEin the systemd service unit to ≥65536, setfs.file-maxat the kernel level, and tunemax_files_per_processinpostgresql.confto match. - Shortcut: Use our Client-Side Sandbox above — paste your
postgresql.conforsystemctl cat postgresqloutput and get auto-refactored config with zero data leaving your browser.
The Incident (What Does the Error Mean?)
Raw error from postgresql.log:
FATAL: out of file descriptors: could not open file "base/16384/1259": %m
LOG: server process (PID 31892) was terminated by signal 6: Aborted
DETAIL: Failed process was running: SELECT * FROM pg_class;
Or from pg_log during high-concurrency:
ERROR: could not open relation with OID 12345: out of file descriptors
WARNING: out of file descriptors: %m; release and retry
Immediate consequence: PostgreSQL cannot open heap files, WAL segments, or socket files. Every in-flight transaction on that backend dies. If the postmaster itself hits the limit, accept() on new connections fails — the entire cluster becomes unreachable. Replication standby lag spikes immediately as WAL receiver loses its file handle.
The Attack Vector / Blast Radius
This is a cascading resource exhaustion failure, not a single-point error.
Why it happens faster than you expect:
Each PostgreSQL backend holds open FDs for:
- Its client socket
- Every relation (table/index) it touches in a query
- WAL segment files
- Shared memory control files
pg_stattemp files under heavyANALYZE
With max_connections = 200 and a moderately complex OLAP query touching 40 relations, you can consume 8,000+ FDs before a single application thread complains. Default ulimit -n on most Linux distros is 1024. Default systemd LimitNOFILE for the postgres unit is often inherited at 1024 unless explicitly overridden — even if /etc/security/limits.conf is set correctly, systemd ignores PAM limits.
Blast radius:
- Active queries abort mid-execution → application errors, partial writes, potential data inconsistency if not wrapped in transactions.
- Autovacuum workers die → table bloat accumulates → query plans degrade → performance death spiral.
- WAL archiver cannot open next segment → replication lag → standbys diverge → failover risk.
pg_dumpandpg_basebackupfail silently or corrupt mid-stream.- Connection poolers (PgBouncer) enter error loops, multiplying reconnect storms against the already-degraded postmaster.
How to Fix It
Step 1 — Diagnose Current FD Usage
# Check the postgres process FD limit
ps aux | grep postgres | head -1 # get PID
cat /proc/<PID>/limits | grep 'open files'
# Count currently open FDs
ls /proc/<PID>/fd | wc -l
# Kernel-wide current usage vs max
cat /proc/sys/fs/file-nr
# Output: <allocated> <unused> <max>
# Check systemd effective limit
systemctl show postgresql -p LimitNOFILE
Basic Fix — Raise the Limit Now (Zero-Downtime for systemd)
# Apply kernel max immediately (survives until reboot)
sudo sysctl -w fs.file-max=2097152
# Persist across reboots
echo 'fs.file-max = 2097152' | sudo tee /etc/sysctl.d/99-postgres-fd.conf
sudo sysctl -p /etc/sysctl.d/99-postgres-fd.conf
# Override the systemd unit (no full restart needed for limit bump on Linux 5.x+)
sudo systemctl edit postgresql
In the override editor:
- # (no LimitNOFILE set — inherits systemd default of 1024)
+ [Service]
+ LimitNOFILE=65536
sudo systemctl daemon-reload
sudo systemctl restart postgresql # required for LimitNOFILE to take effect
Enterprise Best Practice — Full Stack Tuning
/etc/sysctl.d/99-postgres-fd.conf:
- # fs.file-max not set (kernel default ~100000 on small instances)
+ fs.file-max = 2097152
+ fs.nr_open = 2097152
/etc/security/limits.d/99-postgres.conf (for non-systemd or PAM sessions):
- # no postgres-specific limits
+ postgres soft nofile 65536
+ postgres hard nofile 1048576
/etc/systemd/system/postgresql.service.d/override.conf:
[Service]
- # LimitNOFILE absent
+ LimitNOFILE=1048576
postgresql.conf:
- max_files_per_process = 1000 # default
+ max_files_per_process = 65536
- max_connections = 500 # oversized, driving FD exhaustion
+ max_connections = 100 # pair with PgBouncer for pooling
Rule of thumb:
LimitNOFILE≥max_connections × 10 + max_files_per_process + 1000(headroom for WAL + autovacuum workers).
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Ansible/Terraform Enforcement
If you provision PostgreSQL hosts via Terraform + cloud-init or Ansible, gate the deployment:
- # No FD validation in provisioner
+ resource "null_resource" "postgres_fd_check" {
+ provisioner "remote-exec" {
+ inline = [
+ "[ $(cat /proc/sys/fs/file-max) -ge 2097152 ] || (echo 'FAIL: fs.file-max too low' && exit 1)",
+ "[ $(systemctl show postgresql -p LimitNOFILE | cut -d= -f2) -ge 65536 ] || (echo 'FAIL: LimitNOFILE too low' && exit 1)"
+ ]
+ }
+ }
2. Prometheus Alerting Rule
# Alert before you hit the wall, not after
- alert: PostgresFileDescriptorExhaustion
expr: |
process_open_fds{job="postgresql"} / process_max_fds{job="postgresql"} > 0.80
for: 2m
labels:
severity: critical
annotations:
summary: "Postgres FD usage above 80% on {{ $labels.instance }}"
runbook: "https://your-wiki/postgres-fd-runbook"
3. Checkov / Custom OPA Policy (for Ansible playbooks)
# checkov custom check: ensure postgres systemd override sets LimitNOFILE
from checkov.common.models.enums import CheckResult
from checkov.ansible.checks.base_ansible_check import BaseAnsibleCheck
class PostgresLimitNOFILE(BaseAnsibleCheck):
def __init__(self):
super().__init__(
name="Ensure LimitNOFILE >= 65536 in postgres systemd override",
check_id="CKV_ANSIBLE_POSTGRES_001"
)
def check_resource_configuration(self, configuration):
limit = configuration.get("LimitNOFILE", 0)
return CheckResult.PASSED if int(limit) >= 65536 else CheckResult.FAILED
4. Load Test Gate in CI
Run pgbench at 2× expected peak connections in your staging pipeline. Fail the pipeline if pg_log emits out of file descriptors during the run:
pgbench -c 150 -j 8 -T 60 mydb 2>&1 | tee /tmp/pgbench.log
grep -i 'file descriptor' /tmp/pgbench.log && echo 'PIPELINE FAIL: FD exhaustion detected' && exit 1
Bottom line: This error is 100% preventable. The default OS and systemd limits are vestigial from 2005-era single-process assumptions. Any PostgreSQL deployment expecting more than ~50 concurrent connections needs these values set on day one, validated in CI, and monitored in production.