Fixing Nginx 'upstream failed (13: Permission denied)' on Unix Sockets Owned by www-data
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5–10 mins
TL;DR
- What broke: Nginx worker processes are running as a different user (e.g.,
nginx) than the socket owner (www-data), so the kernel blocks theconnect()syscall with EACCES. - How to fix it: Align the Nginx worker user with the socket owner, or add the Nginx user to the
www-datagroup and set socket permissions to0660. - Shortcut: Use our Client-Side Sandbox above to paste your
nginx.confand socket ownership output — it auto-refactors the user directive and generates the exactchmod/chowncommands.
The Incident (What Does the Error Mean?)
Raw error from /var/log/nginx/error.log:
2024/01/15 03:42:17 [crit] 18743#18743: *1 connect() to unix:/run/gunicorn/app.sock
failed (13: Permission denied) while connecting to upstream,
client: 10.0.0.1, server: example.com, request: "POST /api/submit HTTP/1.1",
upstream: "http://unix:/run/gunicorn/app.sock:/api/submit"
The Nginx worker process attempted a connect() syscall to the Unix domain socket. The kernel DAC (Discretionary Access Control) check failed — the worker's effective UID/GID has no read+write permission on the socket inode. Every proxied request returns 502 Bad Gateway. Your application is completely down.
The Attack Vector / Blast Radius
This is a misconfiguration that causes total service outage, not a direct exploit vector — but the dangerous temptation is the wrong fix: engineers under pressure often run chmod 777 /run/gunicorn/app.sock. That is catastrophic.
With 777 on a Unix socket:
- Any local process (including compromised web scripts, cron jobs, or other tenants on a shared host) can connect directly to your application backend, bypassing Nginx entirely — no rate limiting, no TLS termination, no WAF rules.
- On systems running PHP-FPM or similar, a single RFI/LFI vulnerability can pivot to direct socket communication with your upstream, exfiltrating data or executing arbitrary backend logic.
- In containerized environments with shared PID/network namespaces,
777sockets are trivially reachable by sibling containers.
The blast radius of the lazy fix is a full authentication bypass of your reverse proxy layer.
How to Fix It (The Solution)
Diagnose First
# Check socket permissions and ownership
ls -la /run/gunicorn/app.sock
# srw-rw---- 1 www-data www-data 0 Jan 15 03:40 /run/gunicorn/app.sock
# Check what user Nginx workers run as
ps aux | grep 'nginx: worker'
# nginx 18743 0.0 0.1 ... nginx: worker process
id nginx
# uid=101(nginx) gid=101(nginx) groups=101(nginx)
# nginx user is NOT in www-data group — this is your problem.
Basic Fix — Add Nginx User to www-data Group
usermod -aG www-data nginx
# Verify
id nginx
# uid=101(nginx) gid=101(nginx) groups=101(nginx),33(www-data)
# Ensure socket is group-writable (0660, not 0777)
chmod 0660 /run/gunicorn/app.sock
# Reload Nginx (workers restart and inherit new group)
nginx -t && systemctl reload nginx
Enterprise Best Practice — Align worker_user in nginx.conf + Systemd Socket Hardening
# /etc/nginx/nginx.conf
- user nginx;
+ user www-data;
worker_processes auto;
pid /run/nginx.pid;
# /etc/systemd/system/gunicorn.socket
[Socket]
ListenStream=/run/gunicorn/app.sock
- SocketMode=0666
+ SocketMode=0660
+ SocketUser=www-data
+ SocketGroup=www-data
# /etc/systemd/system/gunicorn.service
[Service]
- User=deploy
- Group=deploy
+ User=www-data
+ Group=www-data
ExecStart=/usr/local/bin/gunicorn ...
After editing systemd units:
systemctl daemon-reload
systemctl restart gunicorn.socket gunicorn.service
nginx -t && systemctl reload nginx
# Confirm
ls -la /run/gunicorn/app.sock
# srw-rw---- 1 www-data www-data 0 ...
Why this is the correct approach: Nginx workers and the upstream process share the same group (www-data). Socket mode 0660 means owner+group have rw, world has nothing. No process outside www-data group touches this socket.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Dockerfile / Image Hardening
# Enforce consistent user at build time
RUN groupadd -r www-data && useradd -r -g www-data www-data
USER www-data
2. Checkov Policy (IaC Scanning)
If you're generating systemd units or Nginx configs via Ansible/Terraform, add a Checkov custom check:
# checkov custom check — flag SocketMode=0666 or 0777
from checkov.common.models.enums import CheckResult
def check_socket_mode(config):
mode = config.get('SocketMode', '0660')
if mode in ('0666', '0777', '0776'):
return CheckResult.FAILED
return CheckResult.PASSED
3. Ansible Assert Task
- name: Assert Nginx worker user matches socket owner
assert:
that:
- nginx_worker_user == socket_owner_user or nginx_worker_group == socket_owner_group
fail_msg: "Nginx user/group mismatch with upstream socket owner. Will cause EACCES."
4. Integration Test in Pipeline
#!/bin/bash
# smoke-test.sh — run post-deploy
SOCKET=/run/gunicorn/app.sock
NGINX_USER=$(ps -o user= -p $(pgrep -f 'nginx: worker' | head -1))
SOCKET_GROUP=$(stat -c '%G' $SOCKET)
if ! id "$NGINX_USER" | grep -q "$SOCKET_GROUP"; then
echo "FATAL: Nginx worker '$NGINX_USER' not in socket group '$SOCKET_GROUP'"
exit 1
fi
echo "OK: Permission alignment verified."
Plug smoke-test.sh into your GitHub Actions post-deploy job. A failed check blocks the release before your monitoring even fires.