Fixing DaemonSet Termination Grace Period Exceeded: Force Delete Root Cause and Resolution
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 10–20 mins
TL;DR
- What broke: Kubelet sent
SIGTERMto DaemonSet pods during drain/rollout, but the container process did not exit withinterminationGracePeriodSeconds, triggering aSIGKILLforce delete. - How to fix it: Increase
terminationGracePeriodSecondsto match actual shutdown latency, add apreStoplifecycle hook, and ensure your application handlesSIGTERMgracefully. - Use our Client-Side Sandbox below to paste your DaemonSet YAML and auto-refactor the termination config without sending your manifests to any external server.
The Incident (What Does the Error Mean?)
Raw kubelet log output during a kubectl drain or rolling update:
W0610 03:14:52.001234 1842 pod_workers.go:951] Pod "fluentd-xk9p2" on node "node-03" exceeded grace period 30s, force deleting
E0610 03:14:52.003891 1842 kuberuntime_manager.go:754] killContainer "fluentd" failed: context deadline exceeded
W0610 03:14:52.004100 1842 generic.go:322] Forcing deletion of pod "fluentd-xk9p2"
Immediate consequence: The container received SIGKILL. Any in-flight log buffers, open file handles, or network connections were torn down without cleanup. For log shippers (Fluentd, Filebeat, Promtail), this means log loss. For network plugins (Calico, Cilium node agents), this means stale eBPF/iptables rules left on the node.
This is not a one-off event. Every node drain, every DaemonSet rollout, every node upgrade will reproduce this until you fix the root cause.
The Attack Vector / Blast Radius
DaemonSets run on every node. A misconfigured grace period does not affect one pod — it affects the entire fleet simultaneously during a cluster upgrade or mass drain event.
Cascading failure chain:
- Log shippers force-killed → gaps in observability → security events go unlogged during the exact window of a maintenance operation (worst possible time to lose visibility).
- CNI plugin agents force-killed → stale routing rules → new pods on the node get networking failures until the agent restarts and reconciles.
- Node-local metrics agents force-killed → alerting blind spots → SLO breach goes undetected.
- Storage drivers (e.g., node-plugin DaemonSets) force-killed mid-detach → volume attachment stuck in
Terminating→ downstream StatefulSet pods cannot reschedule.
The default terminationGracePeriodSeconds is 30 seconds — set by Kubernetes when you don't specify it. Most production log shippers with large buffers, TLS teardown, and flush cycles need 60–120 seconds minimum.
How to Fix It
Basic Fix — Increase the Grace Period
Identify your actual shutdown time first:
# Time how long your container takes to exit cleanly under SIGTERM
kubectl exec -it <pod> -- time kill -SIGTERM 1
# Check recent force-delete events
kubectl get events --field-selector reason=Killing -A | grep -i exceed
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
spec:
template:
spec:
- terminationGracePeriodSeconds: 30
+ terminationGracePeriodSeconds: 90
containers:
- name: fluentd
image: fluent/fluentd:v1.16
Enterprise Best Practice — preStop Hook + Signal Handling
Increasing the number alone is not sufficient. The container process must handle SIGTERM. Many JVM-based and Go-based agents ignore SIGTERM when PID 1 is a shell wrapper. Add a preStop hook to buy deterministic drain time regardless of signal handling quality.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
+ maxUnavailable: 1
template:
spec:
- terminationGracePeriodSeconds: 30
+ terminationGracePeriodSeconds: 120
containers:
- name: fluentd
image: fluent/fluentd:v1.16
+ lifecycle:
+ preStop:
+ exec:
+ command: ["/bin/sh", "-c", "sleep 5 && kill -SIGTERM $(pgrep -f fluentd) && sleep 10"]
+ resources:
+ requests:
+ cpu: "100m"
+ memory: "200Mi"
+ limits:
+ memory: "400Mi"
volumes:
- name: varlog
hostPath:
path: /var/log
Why preStop + sleep: Kubernetes sends SIGTERM AND starts the grace period countdown simultaneously. The preStop hook runs before SIGTERM and its duration counts against terminationGracePeriodSeconds. The sleep 5 gives the load balancer / kube-proxy time to remove the pod from endpoints before the process starts shutting down — critical for DaemonSet pods that also serve local traffic.
Critical rule: preStop duration + actual shutdown time must be less than terminationGracePeriodSeconds. Leave at least 10 seconds of headroom.
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
OPA / Gatekeeper Policy
Enforce a minimum grace period on all DaemonSet objects at admission time:
package daemonset.termination
violation[{"msg": msg}] {
input.review.object.kind == "DaemonSet"
grace := input.review.object.spec.template.spec.terminationGracePeriodSeconds
grace < 60
msg := sprintf("DaemonSet '%v' terminationGracePeriodSeconds is %v, minimum is 60", [
input.review.object.metadata.name, grace
])
}
violation[{"msg": msg}] {
input.review.object.kind == "DaemonSet"
not input.review.object.spec.template.spec.terminationGracePeriodSeconds
msg := sprintf("DaemonSet '%v' does not set terminationGracePeriodSeconds (defaults to 30, too low)", [
input.review.object.metadata.name
])
}
Checkov Static Analysis
# Add to your CI pipeline
checkov -f daemonset.yaml --check CKV_K8S_30
# CKV_K8S_30 flags missing/low liveness and termination configs
Checkov Custom Check (if built-in is insufficient)
# .checkov/custom_checks/daemonset_grace_period.py
from checkov.common.models.enums import CheckResult
from checkov.kubernetes.checks.resource.base_spec_check import BaseK8Check
class DaemonSetGracePeriodCheck(BaseK8Check):
def __init__(self):
name = "Ensure DaemonSet terminationGracePeriodSeconds >= 60"
id = "CKV2_K8S_CUSTOM_001"
super().__init__(name=name, id=id, supported_entities=['DaemonSet'])
def scan_spec_conf(self, conf):
grace = conf.get("spec", {}).get("template", {}).get("spec", {}).get("terminationGracePeriodSeconds", 30)
return CheckResult.PASSED if grace >= 60 else CheckResult.FAILED
Monitoring Alert (Prometheus)
# Alert fires before the problem causes an outage
- alert: DaemonSetPodForceKilled
expr: |
increase(kubelet_pod_worker_duration_seconds_count{
operation_type="sync",
pod=~".*"
}[5m]) > 0
and on(pod, namespace)
kube_pod_status_reason{reason="Evicted"} == 1
for: 0m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} force-killed on {{ $labels.node }}"
description: "Check terminationGracePeriodSeconds and preStop hooks."
Runbook checklist for every DaemonSet deployment:
-
terminationGracePeriodSecondsexplicitly set and ≥ 60 -
preStophook defined - Application process is PID 1 or uses
tini/dumb-initas init - SIGTERM handler tested under load
- OPA/Gatekeeper policy enforced in cluster
- Checkov check passing in CI