Can I renew Kubernetes API server certs without downtime?

Near-zero downtime is possible if you act before expiry. Run 'kubeadm certs renew all', then do a rolling restart of the static pod manifests (move them out of /etc/kubernetes/manifests and back). In-flight API requests will drop during the ~10-second restart window per control plane node. If certs are already expired, you will have downtime until renewal completes — existing running pods on worker nodes continue executing, but no new scheduling or API calls succeed.

Why does kubeadm only issue 1-year certificates?

This is an intentional upstream decision in kubeadm (KEP-2522). The rationale is that annual rotation forces operators to actively maintain clusters rather than running stale, unpatched control planes indefinitely. The CA cert is 10 years. There is no official kubeadm flag to change the 1-year leaf cert TTL — you must either patch the kubeadm source, use an external CA with cert-manager, or automate annual renewal via cron/CI.

My cluster CA also expired. Is the renewal process different?

Yes, and it is significantly more complex. CA expiry ('ca.crt' at 10 years) requires rotating the root CA itself, which means re-issuing all leaf certs signed by the old CA, redistributing the new CA bundle to all nodes and any external clients (Helm, CI/CD, monitoring agents), and restarting every component that has the old CA in its trust store. This is a major maintenance operation. Run 'kubeadm certs check-expiration' to confirm whether it's the leaf certs or the CA that expired before proceeding.

How to Fix x509: Certificate Has Expired on Kubernetes API Server (kubeadm Cert Renewal Guide)

Threat/Impact Level: CRITICAL | Exploitability/Downtime Risk: HIGH | Time to Fix: 15–30 mins

TL;DR

What broke: The Kubernetes API server's TLS certificate hit its expiry date. Every component that speaks to kube-apiserver — kubectl, kubelet, kube-controller-manager, kube-scheduler, etcd clients — is now throwing x509 handshake failures. Your cluster is effectively down.
How to fix it: Run kubeadm certs renew all on every control plane node, then restart the static pod manifests. Full steps below.
Use the Client-Side Sandbox above to paste your kubeadm-config.yaml or openssl cert dump and auto-generate the correct renewal commands and patched config.

The Incident (What Does the Error Mean?)

Raw error surfacing in kubectl or API server logs:

Unable to connect to the server: x509: certificate has expired or is not yet valid:
current time 2024-11-01T03:22:11Z is after 2024-10-15T12:00:00Z

Or from journalctl -u kubelet:

E1101 03:22:11.443201    1 reflector.go:138] k8s.io/client-go/informers/factory.go:134:
Failed to watch *v1.Node: failed to list *v1.Node: 
x509: certificate has expired or is not yet valid

kubeadm-provisioned clusters issue certificates with a 1-year TTL by default. The API server cert (/etc/kubernetes/pki/apiserver.crt) expires silently unless you have monitoring in place. When it does, the entire control plane mutual-TLS chain collapses simultaneously.

Affected certificate files on a standard kubeadm cluster:

Certificate	Path	Typical TTL
API Server	`/etc/kubernetes/pki/apiserver.crt`	1 year
API Server kubelet client	`/etc/kubernetes/pki/apiserver-kubelet-client.crt`	1 year
Front proxy client	`/etc/kubernetes/pki/front-proxy-client.crt`	1 year
Controller Manager kubeconfig	`/etc/kubernetes/controller-manager.conf`	1 year
Scheduler kubeconfig	`/etc/kubernetes/scheduler.conf`	1 year
CA (Root)	`/etc/kubernetes/pki/ca.crt`	10 years

The Attack Vector / Blast Radius

This is not just an inconvenience. The blast radius is total control plane failure:

kubectl is dead. Every human operator and CI/CD pipeline loses API access instantly. kubectl get pods returns the x509 error regardless of RBAC permissions.
kubelet stops reconciling. Kubelets on worker nodes can no longer report node status or receive pod specs. Existing running pods survive (kubelet's local state), but no new scheduling occurs.
kube-controller-manager and kube-scheduler lose API connectivity. Deployments stop reconciling. Dead pods are not replaced. HPA cannot scale.
Ingress controllers and operators fail. Any in-cluster controller using a service account token that also validates the API server cert via the cluster CA will begin throwing errors.
etcd is unaffected — but you cannot reach it through the API server, making the point moot.

If this is a multi-control-plane HA cluster, the cert expiry hits all nodes simultaneously because kubeadm generates certs with the same issuance timestamp across all nodes during kubeadm join.

How to Fix It

Step 0: Verify Expiry First

# Check all kubeadm-managed cert expiry dates
kubeadm certs check-expiration

# Or inspect the raw cert
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -dates -subject -ext subjectAltName

Basic Fix — Renew All Certs with kubeadm

Run on each control plane node:

# Renew all certificates in one shot
kubeadm certs renew all

# Restart control plane static pods by moving manifests
# (kubelet will auto-restart them when manifests reappear)
cd /etc/kubernetes/manifests
mv kube-apiserver.yaml /tmp/
mv kube-controller-manager.yaml /tmp/
mv kube-scheduler.yaml /tmp/
sleep 20
mv /tmp/kube-apiserver.yaml .
mv /tmp/kube-controller-manager.yaml .
mv /tmp/kube-scheduler.yaml .

# Regenerate your local kubeconfig (if admin.conf also expired)
cp /etc/kubernetes/admin.conf ~/.kube/config
kubectl get nodes

Enterprise Best Practice — Automate Renewal Before Expiry

The real fix is ensuring this never hits production. Two approaches:

Option A: kubeadm cert renewal via cron (bare-metal/VM clusters)

- # No renewal automation. Certs expire silently after 365 days.
+ # /etc/cron.d/k8s-cert-renewal
+ # Runs 30 days before typical expiry. Adjust date logic to your issuance date.
+ 0 2 1 * * root kubeadm certs renew all && \
+   crictl rm -f $(crictl ps -q --name kube-apiserver) && \
+   crictl rm -f $(crictl ps -q --name kube-controller-manager) && \
+   crictl rm -f $(crictl ps -q --name kube-scheduler) 2>/dev/null; \
+   cp /etc/kubernetes/admin.conf /root/.kube/config

Option B: Extend cert validity at cluster init (kubeadm ClusterConfiguration)

 apiVersion: kubeadm.k8s.io/v1beta3
 kind: ClusterConfiguration
 kubernetesVersion: v1.29.0
 controlPlaneEndpoint: "k8s-api.internal:6443"
+certificatesDir: /etc/kubernetes/pki
 networking:
   podSubnet: "10.244.0.0/16"
   serviceSubnet: "10.96.0.0/12"
 apiServer:
   certSANs:
     - "k8s-api.internal"
     - "10.0.1.10"
+    - "10.0.1.11"   # Add ALL control plane IPs upfront to avoid SAN mismatch after renewal
+    - "10.0.1.12"
+    - "127.0.0.1"
+    - "localhost"

⚠️ kubeadm does not support extending the 1-year TTL natively without patching. For longer TTLs, you must either use a custom CA with cert-manager, or patch the kubeadm binary. The --experimental-upload-certs flag does not affect cert TTL.

Option C: cert-manager with trust-manager for in-cluster workloads (not control plane certs)

- # Manual TLS secret rotation for ingress/internal services
+ apiVersion: cert-manager.io/v1
+ kind: Certificate
+ metadata:
+   name: api-internal-tls
+   namespace: kube-system
+ spec:
+   secretName: api-internal-tls-secret
+   duration: 8760h   # 1 year
+   renewBefore: 720h # Renew 30 days before expiry
+   issuerRef:
+     name: internal-ca-issuer
+     kind: ClusterIssuer
+   dnsNames:
+     - k8s-api.internal

💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.

Prevention in CI/CD

1. Prometheus + Alertmanager — Alert 30 Days Before Expiry

# prometheus-rules.yaml
groups:
  - name: kubernetes-cert-expiry
    rules:
      - alert: KubernetesCertExpiryWarning
        expr: kubeadm_certs_expiration_seconds - time() < 2592000  # 30 days
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "K8s cert expiring: {{ $labels.name }}"
          description: "Certificate {{ $labels.name }} expires in {{ $value | humanizeDuration }}"
      - alert: KubernetesCertExpiryCritical
        expr: kubeadm_certs_expiration_seconds - time() < 604800   # 7 days
        for: 10m
        labels:
          severity: critical

2. Datadog / Nagios Check (if not running Prometheus)

#!/bin/bash
# check_k8s_certs.sh — run via cron or monitoring agent
WARN_DAYS=30
CRIT_DAYS=7

for cert in /etc/kubernetes/pki/*.crt; do
  expiry=$(openssl x509 -in "$cert" -noout -enddate 2>/dev/null | cut -d= -f2)
  expiry_epoch=$(date -d "$expiry" +%s)
  now_epoch=$(date +%s)
  days_left=$(( (expiry_epoch - now_epoch) / 86400 ))
  
  if [ $days_left -lt $CRIT_DAYS ]; then
    echo "CRITICAL: $cert expires in $days_left days"
    exit 2
  elif [ $days_left -lt $WARN_DAYS ]; then
    echo "WARNING: $cert expires in $days_left days"
    exit 1
  fi
done
echo "OK: All certs valid for more than $WARN_DAYS days"

3. Checkov / Trivy in CI — Catch Misconfigured Cert SANs Pre-Deploy

# In your GitOps pipeline (GitHub Actions / GitLab CI)
- name: Scan kubeadm config for cert misconfigs
  run: |
    trivy config --severity HIGH,CRITICAL ./kubeadm-config.yaml
    checkov -f kubeadm-config.yaml --framework kubernetes

4. Managed Kubernetes — Eliminate the Problem Entirely

If you're running self-managed kubeadm clusters in production specifically to avoid this class of problem: don't. EKS, GKE, and AKS all rotate control plane certificates automatically. The operational overhead of manual cert lifecycle management on kubeadm in production is not justified unless you have hard regulatory or air-gap requirements.