Initializing Enclave...

Fixing Nginx 'upstream failed (13: Permission denied)' on Unix Sockets Owned by www-data

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5–10 mins

TL;DR

  • What broke: Nginx worker processes are running as a different user (e.g., nginx) than the socket owner (www-data), so the kernel blocks the connect() syscall with EACCES.
  • How to fix it: Align the Nginx worker user with the socket owner, or add the Nginx user to the www-data group and set socket permissions to 0660.
  • Shortcut: Use our Client-Side Sandbox above to paste your nginx.conf and socket ownership output — it auto-refactors the user directive and generates the exact chmod/chown commands.

The Incident (What Does the Error Mean?)

Raw error from /var/log/nginx/error.log:

2024/01/15 03:42:17 [crit] 18743#18743: *1 connect() to unix:/run/gunicorn/app.sock
failed (13: Permission denied) while connecting to upstream,
client: 10.0.0.1, server: example.com, request: "POST /api/submit HTTP/1.1",
upstream: "http://unix:/run/gunicorn/app.sock:/api/submit"

The Nginx worker process attempted a connect() syscall to the Unix domain socket. The kernel DAC (Discretionary Access Control) check failed — the worker's effective UID/GID has no read+write permission on the socket inode. Every proxied request returns 502 Bad Gateway. Your application is completely down.


The Attack Vector / Blast Radius

This is a misconfiguration that causes total service outage, not a direct exploit vector — but the dangerous temptation is the wrong fix: engineers under pressure often run chmod 777 /run/gunicorn/app.sock. That is catastrophic.

With 777 on a Unix socket:

  • Any local process (including compromised web scripts, cron jobs, or other tenants on a shared host) can connect directly to your application backend, bypassing Nginx entirely — no rate limiting, no TLS termination, no WAF rules.
  • On systems running PHP-FPM or similar, a single RFI/LFI vulnerability can pivot to direct socket communication with your upstream, exfiltrating data or executing arbitrary backend logic.
  • In containerized environments with shared PID/network namespaces, 777 sockets are trivially reachable by sibling containers.

The blast radius of the lazy fix is a full authentication bypass of your reverse proxy layer.


How to Fix It (The Solution)

Diagnose First

# Check socket permissions and ownership
ls -la /run/gunicorn/app.sock
# srw-rw---- 1 www-data www-data 0 Jan 15 03:40 /run/gunicorn/app.sock

# Check what user Nginx workers run as
ps aux | grep 'nginx: worker'
# nginx    18743  0.0  0.1  ...  nginx: worker process

id nginx
# uid=101(nginx) gid=101(nginx) groups=101(nginx)
# nginx user is NOT in www-data group — this is your problem.

Basic Fix — Add Nginx User to www-data Group

usermod -aG www-data nginx
# Verify
id nginx
# uid=101(nginx) gid=101(nginx) groups=101(nginx),33(www-data)

# Ensure socket is group-writable (0660, not 0777)
chmod 0660 /run/gunicorn/app.sock

# Reload Nginx (workers restart and inherit new group)
nginx -t && systemctl reload nginx

Enterprise Best Practice — Align worker_user in nginx.conf + Systemd Socket Hardening

# /etc/nginx/nginx.conf
- user nginx;
+ user www-data;
  worker_processes auto;
  pid /run/nginx.pid;
# /etc/systemd/system/gunicorn.socket
[Socket]
ListenStream=/run/gunicorn/app.sock
- SocketMode=0666
+ SocketMode=0660
+ SocketUser=www-data
+ SocketGroup=www-data
# /etc/systemd/system/gunicorn.service
[Service]
- User=deploy
- Group=deploy
+ User=www-data
+ Group=www-data
  ExecStart=/usr/local/bin/gunicorn ...

After editing systemd units:

systemctl daemon-reload
systemctl restart gunicorn.socket gunicorn.service
nginx -t && systemctl reload nginx

# Confirm
ls -la /run/gunicorn/app.sock
# srw-rw---- 1 www-data www-data 0 ...

Why this is the correct approach: Nginx workers and the upstream process share the same group (www-data). Socket mode 0660 means owner+group have rw, world has nothing. No process outside www-data group touches this socket.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Dockerfile / Image Hardening

# Enforce consistent user at build time
RUN groupadd -r www-data && useradd -r -g www-data www-data
USER www-data

2. Checkov Policy (IaC Scanning)

If you're generating systemd units or Nginx configs via Ansible/Terraform, add a Checkov custom check:

# checkov custom check — flag SocketMode=0666 or 0777
from checkov.common.models.enums import CheckResult
def check_socket_mode(config):
    mode = config.get('SocketMode', '0660')
    if mode in ('0666', '0777', '0776'):
        return CheckResult.FAILED
    return CheckResult.PASSED

3. Ansible Assert Task

- name: Assert Nginx worker user matches socket owner
  assert:
    that:
      - nginx_worker_user == socket_owner_user or nginx_worker_group == socket_owner_group
    fail_msg: "Nginx user/group mismatch with upstream socket owner. Will cause EACCES."

4. Integration Test in Pipeline

#!/bin/bash
# smoke-test.sh — run post-deploy
SOCKET=/run/gunicorn/app.sock
NGINX_USER=$(ps -o user= -p $(pgrep -f 'nginx: worker' | head -1))
SOCKET_GROUP=$(stat -c '%G' $SOCKET)

if ! id "$NGINX_USER" | grep -q "$SOCKET_GROUP"; then
  echo "FATAL: Nginx worker '$NGINX_USER' not in socket group '$SOCKET_GROUP'"
  exit 1
fi
echo "OK: Permission alignment verified."

Plug smoke-test.sh into your GitHub Actions post-deploy job. A failed check blocks the release before your monitoring even fires.

Related Diagnostics

"Part of the Security Utility Matrix."

View all 140 Security Tools →