Fixing Kubernetes API Server 429 Too Many Requests: Rate Limit Debugging Guide
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins
TL;DR
- What broke: Concurrent
kubectlcommands or controllers are exceeding the API server's request rate limits, returningHTTP 429 Too Many Requestsand stalling all control-plane operations. - How to fix it: Tune client-go
--qpsand--burstflags, implement API Priority and Fairness (APF)PriorityLevelConfiguration, and serialize or throttle your automation scripts. - Fast path: Use our Client-Side Sandbox below to auto-refactor your kubeconfig or controller manifest — it detects missing QPS/burst settings and generates corrected APF objects instantly.
The Incident (What Does the Error Mean?)
Raw error output from a throttled client:
Error from server (TooManyRequests): the server has received too many requests and has asked us to try again later
W0612 14:32:01.847321 18345 request.go:697] Waited for 1.237s due to client-side throttling, not priority and fairness, request: GET:https://api.prod-cluster.internal:6443/api/v1/namespaces/default/pods
Error: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Immediate consequence: Every kubectl call — get, apply, rollout status — queues or fails. CI/CD pipelines stall. HPA controllers stop scaling. Operators lose reconciliation loops. The cluster does not go down, but it becomes operationally blind.
The API server enforces two independent rate-limiting layers:
- Client-side throttling (client-go): default
QPS=5,Burst=10per client binary. - Server-side APF (API Priority and Fairness, GA in 1.29):
FlowSchema+PriorityLevelConfigurationobjects that bucket and queue requests by user, verb, and resource.
Running 50 parallel kubectl processes in a deploy script bypasses client-side throttle entirely — each process has its own limiter — and floods the server-side APF queues.
The Attack Vector / Blast Radius
This is a self-inflicted denial-of-service against your own control plane.
Cascading failure chain:
- CI/CD pipeline runs
kubectl applyin a loop across 30 namespaces in parallel. - Each
kubectlprocess has its own client-go rate limiter — 30 processes × burst=10 = 300 simultaneous requests hitting the API server in under 1 second. - APF
catch-allFlowSchema exhausts its queue depth (queueLengthLimit: 50). New requests get 429 + Retry-After header. kube-controller-managerandkube-schedulershare the same APF bucket as your scripts if no dedicated FlowSchema exists. System controllers start missing their reconciliation windows.- Nodes stop reporting heartbeats to the API server on time →
NodeNotReadyconditions fire → workloads get evicted. - In multi-tenant clusters, one team's runaway script can starve all other tenants of API access.
The real blast radius: It's not just your deploy. It's every controller, every webhook, every metrics scraper in the cluster.
How to Fix It
Basic Fix — Serialize Your kubectl Calls
Stop running kubectl in parallel shell loops:
- for ns in $(cat namespaces.txt); do
- kubectl apply -f manifests/ -n $ns &
- done
- wait
+ for ns in $(cat namespaces.txt); do
+ kubectl apply -f manifests/ -n $ns
+ sleep 0.5
+ done
For scripts that must parallelize, cap concurrency with xargs -P:
- cat namespaces.txt | xargs -I{} kubectl apply -f manifests/ -n {}
+ cat namespaces.txt | xargs -P 4 -I{} kubectl apply -f manifests/ -n {}
Enterprise Best Practice — Tune APF and Client QPS
Step 1: Raise QPS/Burst for automation service accounts in kubeconfig or controller flags.
# controller-manager or custom operator deployment args
containers:
- name: my-operator
args:
- - --kube-api-qps=5
- - --kube-api-burst=10
+ - --kube-api-qps=50
+ - --kube-api-burst=100
Step 2: Create a dedicated APF FlowSchema and PriorityLevelConfiguration for CI/CD service accounts. This isolates their traffic from system controllers.
+apiVersion: flowcontrol.apiserver.k8s.io/v1
+kind: PriorityLevelConfiguration
+metadata:
+ name: ci-cd-priority
+spec:
+ type: Limited
+ limited:
+ nominalConcurrencyShares: 20
+ limitResponse:
+ type: Queue
+ queuing:
+ queues: 16
+ handSize: 4
+ queueLengthLimit: 100
---
+apiVersion: flowcontrol.apiserver.k8s.io/v1
+kind: FlowSchema
+metadata:
+ name: ci-cd-flow
+spec:
+ priorityLevelConfiguration:
+ name: ci-cd-priority
+ matchingPrecedence: 900
+ distinguisherMethod:
+ type: ByUser
+ rules:
+ - subjects:
+ - kind: ServiceAccount
+ serviceAccount:
+ name: ci-deployer
+ namespace: ci-system
+ resourceRules:
+ - verbs: ["*"]
+ apiGroups: ["*"]
+ resources: ["*"]
+ namespaces: ["*"]
Step 3: Verify APF is actually throttling (not client-side) by inspecting the warning log line. The request.go warning explicitly states which layer is throttling:
# Client-side throttle (fix: raise --kube-api-qps)
Waited for Xs due to client-side throttling, not priority and fairness
# Server-side APF throttle (fix: tune FlowSchema/PriorityLevel)
Waited for Xs due to priority and fairness
Step 4: Monitor APF queue depth in real time.
kubectl get --raw /metrics | grep apiserver_flowcontrol_request_queue_length_after_enqueue
kubectl get --raw /metrics | grep apiserver_flowcontrol_rejected_requests_total
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. OPA/Gatekeeper policy — block FlowSchema gaps at admission:
Enforce that every new ServiceAccount in ci-* namespaces has a corresponding FlowSchema before it can be used in deployments. Write a Rego rule that cross-references FlowSchema.spec.rules[].subjects against newly created service accounts.
2. Checkov / kube-linter in your pipeline:
# .checkov.yaml
checks:
- CKV_K8S_FLOWSCHEMA_EXISTS # custom check: every SA has a FlowSchema
Add kube-linter to your Helm/kustomize pipeline:
kube-linter lint manifests/ --config .kube-linter.yaml
3. Rate-limit your CI scripts at the pipeline level (GitHub Actions example):
- name: Apply manifests with concurrency cap
run: |
echo "$NAMESPACES" | xargs -P 4 -I{} \
kubectl apply -f manifests/ -n {}
env:
KUBECTL_QPS: "20"
KUBECTL_BURST: "40"
4. Alert on APF rejection rate before it becomes an outage:
# Prometheus alerting rule
- alert: KubeAPIFlowControlRejectionHigh
expr: rate(apiserver_flowcontrol_rejected_requests_total[5m]) > 5
for: 2m
labels:
severity: warning
annotations:
summary: "API server APF rejecting >5 req/s — check FlowSchema config"
5. Load-test your control plane before scaling CI parallelism. Use k6 with the Kubernetes API as the target in a staging cluster to find your APF ceiling before it finds you in production.