How to Fix kubectl top pods Showing Empty Metrics: Metrics Server kubelet-insecure-tls Configuration
Threat/Impact Level: MEDIUM | Exploitability/Downtime Risk: LOW (operational blind spot, not direct RCE) | Time to Fix: 5 mins
TL;DR
- What broke: Metrics Server pods are running but
kubectl top pods/nodesreturnserror: metrics not available yetor hangs empty — because Metrics Server cannot scrape kubelet/metrics/resourceendpoints over TLS without valid cert verification. - How to fix it: Add
--kubelet-insecure-tls(dev/non-prod) or mount proper CA certs and use--kubelet-certificate-authority(production) to the Metrics Server Deployment args. - Shortcut: Use our Client-Side Sandbox below to auto-refactor this — paste your Metrics Server manifest and get the corrected spec without sending your cluster config to any external server.
The Incident (What does the error mean?)
You run kubectl top pods -n production and get:
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
or the HPA silently stops scaling because it has no CPU/memory signal. Metrics Server logs tell the real story:
E0612 14:32:01.783204 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.10:10250/metrics/resource\": x509: certificate signed by unknown authority" node="worker-node-01"
Immediate consequence: HPA controllers go blind. kubectl top is dead. Any autoscaling tied to CPU/memory metrics stops functioning. In clusters using kubeadm or self-managed CA, kubelet serving certs are signed by the cluster CA — which Metrics Server doesn't trust by default because it runs with its own TLS context and no CA bundle is injected.
The Attack Vector / Blast Radius
This is an operational security misconfiguration, not a direct exploit vector — but the blast radius is significant:
- Autoscaler blindness: HPA cannot retrieve
metrics.k8s.ioAPI objects. Pods under load will not scale out. You get cascading OOM kills or latency spikes with zero automated remediation. - The insecure bypass risk: The common "just add
--kubelet-insecure-tls" fix disables TLS verification entirely between Metrics Server and every kubelet endpoint. In a multi-tenant cluster or one exposed to a compromised internal network, a MITM attacker on the pod network could serve fake metrics — causing HPAs to scale down healthy workloads or refuse to scale up under real load. This is a metrics poisoning attack surface. - Compliance failure: PCI-DSS, SOC2, and CIS Kubernetes Benchmark (Control 4.2.10) explicitly require kubelet serving certificate verification. Running
--kubelet-insecure-tlsin production is an automatic audit finding.
How to Fix It (The Solution)
Basic Fix (Dev / Non-Prod Only)
Add --kubelet-insecure-tls to the Metrics Server container args. This skips x509 verification against kubelet certs.
# metrics-server Deployment spec.containers[0].args
args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
+ - --kubelet-insecure-tls
If deploying via Helm:
# values.yaml
defaultArgs:
- - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
+ - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
+ - --kubelet-insecure-tls
Do not use this in production. Full stop.
Enterprise Best Practice (Production)
The correct fix is to provision valid kubelet serving certificates signed by a CA that Metrics Server trusts, then mount that CA into the Metrics Server pod.
Step 1: Ensure kubelets are using properly signed serving certs. With kubeadm, enable server cert rotation:
# kubelet-config.yaml (on each node)
serverTLSBootstrap: true
+ rotateCertificates: true
Approve pending CSRs:
kubectl get csr | grep Pending | awk '{print $1}' | xargs kubectl certificate approve
Step 2: Mount the cluster CA into Metrics Server and reference it:
# metrics-server Deployment
spec:
containers:
- name: metrics-server
args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP
- - --kubelet-insecure-tls
+ - --kubelet-certificate-authority=/etc/kubernetes/pki/ca.crt
+ volumeMounts:
+ - name: cluster-ca
+ mountPath: /etc/kubernetes/pki/ca.crt
+ readOnly: true
+ volumes:
+ - name: cluster-ca
+ hostPath:
+ path: /etc/kubernetes/pki/ca.crt
+ type: File
Alternatively, use a projected ConfigMap containing the CA bundle instead of a hostPath mount — hostPath is itself a security finding in hardened clusters.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Conftest / OPA policy — block insecure-tls in production namespaces:
# policy/deny_insecure_metrics_server.rego
package main
deny[msg] {
input.kind == "Deployment"
input.metadata.namespace == "kube-system"
container := input.spec.template.spec.containers[_]
contains(container.name, "metrics-server")
arg := container.args[_]
arg == "--kubelet-insecure-tls"
msg := "POLICY VIOLATION: metrics-server --kubelet-insecure-tls is forbidden in production. Use --kubelet-certificate-authority instead."
}
2. Checkov inline scan in your pipeline:
checkov -f metrics-server-deployment.yaml --check CKV_K8S_28,CKV_K8S_30
3. Kube-bench scheduled job: Run kube-bench node as a CronJob targeting control 4.2.10 to continuously assert kubelet serving cert configuration hasn't drifted.
4. Helm chart pinning: Lock your metrics-server chart version in your GitOps repo (fleet.yaml or helmrelease.yaml) and gate upgrades through a PR policy that requires the OPA policy check to pass in CI before merge.