Initializing Enclave...

How to Fix Nginx 'upstream timed out (110)' During SSL Handshake with Upstream API

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

  • What broke: Nginx initiated an SSL handshake to your upstream HTTPS backend but the backend didn't complete the handshake (or send response headers) within proxy_read_timeout. Errno 110 = ETIMEDOUT at the kernel socket layer — the connection was established but went silent.
  • How to fix it: Tune proxy_connect_timeout, proxy_read_timeout, enforce proxy_ssl_server_name on, verify your upstream TLS cert chain, and enable HTTP keepalives to the backend to eliminate per-request handshake overhead.
  • Fast path: Use our Client-Side Sandbox below to auto-refactor this — paste your Nginx location block and get a corrected config with all SSL proxy directives patched, without sending your config to any external server.

The Incident (What Does the Error Mean?)

Raw error from /var/log/nginx/error.log:

2024/01/15 03:42:17 [error] 1234#1234: *58291 upstream timed out (110: Connection timed out)
while reading response header from upstream,
client: 10.0.1.45, server: api-gateway.internal,
request: "POST /v2/payments HTTP/1.1",
upstream: "https://api.example.com/v2/payments",
host: "api-gateway.internal"

Errno 110 is ETIMEDOUT — not a refused connection, not a DNS failure. The TCP socket to the upstream was opened, but Nginx hit its deadline waiting for the upstream to either complete the TLS handshake or send the first byte of the HTTP response header.

Immediate consequence: Nginx returns a 504 Gateway Timeout to the client. Every request hitting this upstream fails until the condition is resolved. In a payment or auth flow, this is a P0 incident.


The Attack Vector / Blast Radius

This isn't a security exploit in the traditional sense — but the blast radius is severe and the failure mode is insidious:

Cascading failure chain:

  1. Nginx worker processes accumulate stuck upstream connections, each holding a file descriptor open until proxy_read_timeout expires.
  2. Under load, this exhausts worker_connections. New client connections get 502 Bad Gateway or are silently dropped.
  3. If upstream is a microservice behind an internal load balancer, the timeout storm triggers health check failures, pulling the upstream out of rotation — which concentrates load on remaining nodes, triggering the same timeout on them.
  4. Connection pool starvation: Without keepalives configured, every proxied request burns a full TLS handshake (~200–400ms RTT overhead on a cold connection). At 500 RPS, you're doing 500 full TLS negotiations per second against the upstream. Any upstream CPU spike causes handshake queuing, which manifests as errno 110.

The security angle: If proxy_ssl_verify is off (common in misconfigured setups), you've already disabled certificate validation on the upstream leg. A network-level attacker on the path between Nginx and the upstream can intercept the connection. Fixing the timeout is the right time to also audit and enforce proxy_ssl_verify on.


How to Fix It (The Solution)

Basic Fix — Tune Timeouts and SSL Directives

 server {
     listen 443 ssl;
     server_name api-gateway.internal;
 
     location /v2/ {
         proxy_pass https://api.example.com;
 
-        # Missing or too-short timeouts
-        proxy_connect_timeout 5s;
-        proxy_read_timeout    10s;
-        proxy_send_timeout    10s;
+        proxy_connect_timeout 10s;
+        proxy_read_timeout    60s;
+        proxy_send_timeout    60s;
 
-        # SSL handshake to upstream will fail SNI without this
-        # proxy_ssl_server_name not set
+        proxy_ssl_server_name on;
+        proxy_ssl_protocols   TLSv1.2 TLSv1.3;
+        proxy_ssl_ciphers     HIGH:!aNULL:!MD5;
 
-        # Certificate verification disabled — MITM risk
-        proxy_ssl_verify off;
+        proxy_ssl_verify        on;
+        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
     }
 }

Why proxy_ssl_server_name on matters: Without it, Nginx sends no SNI extension in the ClientHello. Modern backends and CDNs (Cloudflare, AWS ALB, GCP LB) require SNI to select the correct certificate. Without SNI, the handshake either fails outright or selects a default cert that triggers verification failure — both manifest as errno 110.


Enterprise Best Practice — Upstream Keepalives + Connection Pooling

The real fix at scale is eliminating per-request TLS handshakes entirely using upstream keepalive pools.

+# Define upstream block with keepalive pool
+upstream api_backend {
+    server api.example.com:443;
+
+    # Maintain up to 32 idle keepalive connections per worker
+    keepalive 32;
+    keepalive_requests 1000;
+    keepalive_timeout  65s;
+}
 
 server {
     listen 443 ssl;
 
     location /v2/ {
-        proxy_pass https://api.example.com;
+        proxy_pass https://api_backend;
 
+        # Required for keepalive to upstream to function
+        proxy_http_version 1.1;
+        proxy_set_header   Connection "";
 
         proxy_connect_timeout 10s;
         proxy_read_timeout    60s;
         proxy_send_timeout    60s;
 
         proxy_ssl_server_name on;
         proxy_ssl_verify      on;
         proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
+        proxy_ssl_session_reuse on;
     }
 }

proxy_ssl_session_reuse on instructs Nginx to reuse TLS session tickets for keepalive connections, dropping handshake overhead from ~300ms to ~1ms on reconnect.

proxy_http_version 1.1 + Connection "" — without these two lines, Nginx defaults to HTTP/1.0 upstream, which forces Connection: close on every request, destroying your keepalive pool silently.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

1. Lint Nginx configs in your pipeline before deploy:

# In your CI step — catches missing proxy_ssl_server_name, verify directives
nginx -t -c /path/to/nginx.conf

# gixy: static security analyzer for Nginx
pip install gixy
gixy /etc/nginx/nginx.conf
# Catches: proxy_ssl_verify off, SSRF risks, missing timeouts

2. Checkov IaC scan if Nginx is deployed via Terraform/Helm:

checkov -d ./helm/nginx --framework helm \
  --check CKV_NGINX_1  # TLS verification enabled

3. Synthetic monitoring with timeout assertions:

# Datadog Synthetic or k6 — assert p99 upstream response < 5s
# If p99 creeps toward your proxy_read_timeout, page before errno 110 fires
thresholds:
  http_req_duration: ['p(99)<5000']

4. OPA/Conftest policy to block deploys with proxy_ssl_verify off:

package nginx.ssl

deny[msg] {
    input.proxy_ssl_verify == "off"
    msg := "proxy_ssl_verify must be 'on' — upstream TLS verification is mandatory"
}

5. Track the actual TLS handshake time separately using $upstream_connect_time in your Nginx access log format. If this value spikes, you have a backend TLS performance problem — fix it at the source rather than papering over it with longer timeouts.

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →