Why does setting 'port: 0' in KubeletConfiguration cause 'connection refused' on 10250?

Setting 'port: 0' in KubeletConfiguration explicitly disables the Kubelet HTTPS serving port. Kubelet still attempts to validate its own service connection at startup against the default address 127.0.0.1:10250, finds nothing listening, and aborts with 'connection refused'. The fix is to explicitly set 'port: 10250' in your KubeletConfiguration YAML.

Can a stale kubelet process from a previous failed start cause port 10250 connection refused?

Yes. A zombie kubelet process can hold the port open in a broken state, or conversely, a failed previous instance may have left iptables DROP rules or socket lock files that prevent the new instance from binding. Always run 'sudo pkill -f kubelet && sudo ss -tlnp | grep 10250' to confirm the port is free before restarting the service.

Is port 10250 a security risk and should it be exposed publicly?

Port 10250 is the Kubelet API and is a critical attack surface. It must NEVER be exposed to the public internet. It allows command execution inside pods via the /exec and /run endpoints. Always restrict inbound access to port 10250 to the control plane node CIDR only via security groups or firewall rules, and enforce 'authorization.mode: Webhook' with 'authentication.anonymous.enabled: false' in KubeletConfiguration.

How to Fix Kubelet 'Failed to Run: dial tcp 127.0.0.1:10250 Connection Refused' Error

Threat/Impact Level: HIGH | Exploitability/Downtime Risk: HIGH | Time to Fix: 10–30 mins

TL;DR

What broke: Kubelet cannot bind or connect to its own HTTPS serving port 10250, so the node never reaches Ready state and the API server cannot scrape metrics or exec into pods.
How to fix it: Identify whether another process owns port 10250, whether the --port / --healthz-port flags are misconfigured, or whether the container runtime socket path is wrong — then correct the kubelet config or systemd unit and restart.
Shortcut: Use our Client-Side Sandbox above to paste your kubelet flags or KubeletConfiguration manifest and auto-generate the corrected config.

The Incident (What Does the Error Mean?)

Failed to run kubelet: validate service connection:
  dial tcp 127.0.0.1:10250: connect: connection refused

Kubelet starts, attempts to validate its own internal service connection on 127.0.0.1:10250 (the read-only HTTPS metrics/exec port), and gets an immediate connection refused. This means the Kubelet HTTP server never came up. The node registers as NotReady. Every pod scheduled to this node stays in Pending or Unknown. kubectl exec, kubectl logs, and Prometheus node scraping all fail simultaneously.

Immediate blast: Zero workloads run on this node. If this is a control-plane node, kube-apiserver may lose quorum depending on your HA topology.

The Attack Vector / Blast Radius

Port 10250 is the Kubelet API — it serves:

POST /exec (kubectl exec)
POST /run (direct command execution)
GET /metrics
GET /pods

When Kubelet fails to bind 10250, the node is dead. But the secondary risk is misconfiguration that causes Kubelet to bind 10250 on 0.0.0.0 with anonymous auth enabled — that is a critical RCE vector (CVE-2018-1002105 class). Fixing this error is the moment to also audit the auth config.

Cascading failure chain:

Node stays NotReady → DaemonSets fail → CNI pods evicted → network plane degrades.
In autoscaling clusters, the broken node triggers scale-out, spawning more broken nodes if the AMI/image carries the same misconfiguration.
kube-controller-manager node lifecycle controller starts evicting pods after node-monitor-grace-period (default 40s) — healthy pods on other nodes get rescheduled, spiking load.

How to Fix It

Step 1 — Confirm the port is not already bound

# On the affected node:
sudo ss -tlnp | grep 10250
sudo lsof -i :10250

If another process owns 10250, kill it or reconfigure it. A stale kubelet process from a failed restart is the #1 cause.

sudo systemctl stop kubelet
sudo pkill -f kubelet
sudo ss -tlnp | grep 10250   # must be empty now
sudo systemctl start kubelet

Step 2 — Validate the KubeletConfiguration port binding

Bad config (anonymous auth + wrong port flag):

- apiVersion: kubelet.config.k8s.io/v1beta1
- kind: KubeletConfiguration
- port: 0
- readOnlyPort: 10255
- authentication:
-   anonymous:
-     enabled: true
-   webhook:
-     enabled: false
- authorization:
-   mode: AlwaysAllow

Good config (authenticated, correct port, webhook authz):

+ apiVersion: kubelet.config.k8s.io/v1beta1
+ kind: KubeletConfiguration
+ port: 10250
+ readOnlyPort: 0
+ authentication:
+   anonymous:
+     enabled: false
+   webhook:
+     enabled: true
+   x509:
+     clientCAFile: /etc/kubernetes/pki/ca.crt
+ authorization:
+   mode: Webhook
+ tlsCertFile: /var/lib/kubelet/pki/kubelet.crt
+ tlsPrivateKeyFile: /var/lib/kubelet/pki/kubelet.key

port: 0 disables the serving port entirely — Kubelet starts, tries to validate the connection, and immediately fails. This is the silent killer.

Step 3 — Check the container runtime socket

A missing or wrong CRI socket causes Kubelet to abort before the HTTP server initializes:

- --container-runtime-endpoint=unix:///var/run/dockershim.sock
+ --container-runtime-endpoint=unix:///run/containerd/containerd.sock

Verify the socket exists:

ls -la /run/containerd/containerd.sock
sudo systemctl status containerd

Step 4 — Firewall / iptables check

sudo iptables -L INPUT -n -v | grep 10250
# If a DROP rule exists for 10250 on loopback:
sudo iptables -D INPUT -p tcp --dport 10250 -j DROP

On cloud nodes (AWS/GCP/Azure), also verify the security group / firewall rule allows inbound 10250 from the control plane CIDR.

Enterprise Best Practice — Systemd Drop-in Hardening

# /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
  [Service]
- ExecStart=/usr/bin/kubelet
+ ExecStart=/usr/bin/kubelet \
+   --config=/etc/kubernetes/kubelet-config.yaml \
+   --container-runtime-endpoint=unix:///run/containerd/containerd.sock \
+   --node-ip=$(curl -sf http://169.254.169.254/latest/meta-data/local-ipv4) \
+   --rotate-certificates=true \
+   --rotate-server-certificates=true

Always pass --config pointing to a versioned KubeletConfiguration file tracked in Git. Never rely on bare CLI flags in production — they are invisible to policy engines.

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Validate KubeletConfiguration in your node bootstrap pipeline

# Using kubelet's own dry-run flag (v1.26+)
kubelet --config=/etc/kubernetes/kubelet-config.yaml --dry-run

2. Checkov policy for KubeletConfiguration

# checkov custom policy — block port:0 and anonymous auth
metadata:
  name: CKV_KUBELET_PORT_ENABLED
check:
  resource_type: KubeletConfiguration
  attribute: port
  operator: not_equals
  value: 0

3. OPA/Gatekeeper ConstraintTemplate — enforce webhook authz

package kubelet.auth

violation[{"msg": msg}] {
  input.authorization.mode != "Webhook"
  msg := "Kubelet authorization mode must be Webhook, not AlwaysAllow"
}

violation[{"msg": msg}] {
  input.authentication.anonymous.enabled == true
  msg := "Kubelet anonymous authentication must be disabled"
}

4. Node readiness smoke test in your AMI bake pipeline

#!/bin/bash
# Run post-boot in Packer or EC2 user-data test phase
sleep 15
curl -sk https://127.0.0.1:10250/healthz | grep -q ok || { echo "KUBELET_HEALTHZ_FAIL"; exit 1; }

Fail the AMI build if 10250/healthz doesn't return ok within 30 seconds. This catches port misconfiguration before the image ever reaches production.