Initializing Enclave...

Fixing Nginx 'connect() failed (111: Connection refused)' in Docker Host Network Mode

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 5–15 mins

TL;DR

  • What broke: Nginx cannot reach the upstream service because the target process is bound to 127.0.0.1 only, or the port is wrong — and in host network mode, the usual Docker DNS tricks don't apply.
  • How to fix it: Ensure the upstream service binds to 0.0.0.0 (or the correct host IP), verify the port matches, and confirm --network host is actually applied to the Nginx container.
  • Use our Client-Side Sandbox below to paste your nginx.conf and docker-compose.yml and auto-generate the refactored config.

The Incident (What Does the Error Mean?)

Raw error from /var/log/nginx/error.log:

2024/01/15 03:42:17 [error] 29#29: *1 connect() failed (111: Connection refused)
while connecting to upstream, client: 10.0.0.5, server: _, request:
"GET /api/health HTTP/1.1", upstream: "http://127.0.0.1:8080/api/health",
host: "prod-gateway.internal"

errno 111 = ECONNREFUSED. The TCP SYN hit the host network stack and got a RST back. Nothing is listening on that socket. Nginx tried, the kernel rejected it. Every request hitting this upstream returns 502 Bad Gateway to your users — right now, in production.

In host network mode, the Nginx container shares the host's network namespace directly. There are no virtual bridges, no container IPs. localhost inside Nginx is the host's loopback. So if your upstream app is also on host network and bound to 127.0.0.1:8080, this should work — unless it isn't running, bound to the wrong port, or crashed silently.


The Attack Vector / Blast Radius

This isn't a security exploit — it's a cascading availability failure:

  1. Upstream process died or never started. Nginx came up, upstream didn't. No readiness gate caught it.
  2. Upstream bound to wrong interface. App server configured to 127.0.0.1 inside its own container namespace, but then switched to host network without updating the bind address — now it's bound to a loopback that exists but the port assignment shifted.
  3. Port conflict on the host. Another process grabbed 8080 first. Your app silently failed to bind and exited non-zero — but your restart: always policy is in a crash loop you haven't noticed.
  4. --network host missing from the Nginx container. You applied it to the upstream but not Nginx. Nginx is still on the bridge network trying to reach 127.0.0.1 of its own isolated namespace — which has nothing on port 8080.

Blast radius: 100% of traffic to this upstream returns 502. If Nginx has no fallback (backup server, error_page redirect), your entire service is down.


How to Fix It

Step 1: Confirm What's Actually Listening

# On the host — not inside a container
ss -tlnp | grep 8080
# or
netstat -tlnp | grep 8080

If that returns nothing, your upstream process is not running. Fix that first.


Basic Fix — nginx.conf upstream address

In host network mode, use 127.0.0.1 explicitly (not a container hostname — DNS doesn't apply here):

upstream backend {
-    server app-container:8080;   # WRONG: container DNS doesn't resolve in host network
+    server 127.0.0.1:8080;       # CORRECT: host loopback, shared namespace
}

Basic Fix — docker-compose.yml

services:
  nginx:
    image: nginx:1.25-alpine
-   # network_mode not set — defaults to bridge
+   network_mode: host
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro

  app:
    image: myapp:latest
-   # network_mode not set
+   network_mode: host
    environment:
-     - BIND_ADDR=127.0.0.1:8080   # Only reachable within its own namespace if on bridge
+     - BIND_ADDR=0.0.0.0:8080     # Bind all interfaces so host loopback reaches it

Enterprise Best Practice — Health-Gated Startup + Upstream Keepalive

Don't let Nginx start routing until the upstream is confirmed alive. Add a healthcheck and a depends_on condition, plus keepalive to avoid connection churn:

services:
  nginx:
    image: nginx:1.25-alpine
    network_mode: host
+   depends_on:
+     app:
+       condition: service_healthy
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro

  app:
    image: myapp:latest
    network_mode: host
+   healthcheck:
+     test: ["CMD", "curl", "-sf", "http://127.0.0.1:8080/healthz"]
+     interval: 5s
+     timeout: 3s
+     retries: 5
+     start_period: 10s

And in nginx.conf, add upstream keepalive and failure handling:

upstream backend {
    server 127.0.0.1:8080;
+   keepalive 32;
}

server {
    location / {
        proxy_pass http://backend;
+       proxy_next_upstream error timeout http_502 http_503;
+       proxy_connect_timeout 2s;
+       proxy_read_timeout 10s;
+       proxy_http_version 1.1;
+       proxy_set_header Connection "";
    }
}

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Lint nginx.conf in your pipeline:

# In your GitHub Actions / GitLab CI step
docker run --rm -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro nginx:1.25-alpine nginx -t

This catches syntax errors but not runtime upstream failures. Pair it with an integration test.

2. Integration smoke test after deploy:

# Post-deploy health gate — fail the pipeline if upstream is unreachable
curl --retry 5 --retry-delay 3 --retry-connrefused -sf http://127.0.0.1:80/healthz || exit 1

3. Checkov / Trivy for compose files:

checkov -f docker-compose.yml --check CKV_DOCKER_2  # Healthcheck defined

4. OPA policy to enforce healthcheck on host-network services:

deny[msg] {
    service := input.services[name]
    service.network_mode == "host"
    not service.healthcheck
    msg := sprintf("Service '%v' uses host network but has no healthcheck defined", [name])
}

5. Use a process supervisor on the host for the upstream app (systemd, supervisord) so it restarts before Nginx gives up — don't rely solely on Docker's restart: always which has backoff delays that create 502 windows.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →