Initializing Enclave...

Fixing DaemonSet Termination Grace Period Exceeded: Force Delete Root Cause and Resolution

Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 10–20 mins

TL;DR

  • What broke: Kubelet sent SIGTERM to DaemonSet pods during drain/rollout, but the container process did not exit within terminationGracePeriodSeconds, triggering a SIGKILL force delete.
  • How to fix it: Increase terminationGracePeriodSeconds to match actual shutdown latency, add a preStop lifecycle hook, and ensure your application handles SIGTERM gracefully.
  • Use our Client-Side Sandbox below to paste your DaemonSet YAML and auto-refactor the termination config without sending your manifests to any external server.

The Incident (What Does the Error Mean?)

Raw kubelet log output during a kubectl drain or rolling update:

W0610 03:14:52.001234    1842 pod_workers.go:951] Pod "fluentd-xk9p2" on node "node-03" exceeded grace period 30s, force deleting
E0610 03:14:52.003891    1842 kuberuntime_manager.go:754] killContainer "fluentd" failed: context deadline exceeded
W0610 03:14:52.004100    1842 generic.go:322] Forcing deletion of pod "fluentd-xk9p2"

Immediate consequence: The container received SIGKILL. Any in-flight log buffers, open file handles, or network connections were torn down without cleanup. For log shippers (Fluentd, Filebeat, Promtail), this means log loss. For network plugins (Calico, Cilium node agents), this means stale eBPF/iptables rules left on the node.

This is not a one-off event. Every node drain, every DaemonSet rollout, every node upgrade will reproduce this until you fix the root cause.


The Attack Vector / Blast Radius

DaemonSets run on every node. A misconfigured grace period does not affect one pod — it affects the entire fleet simultaneously during a cluster upgrade or mass drain event.

Cascading failure chain:

  1. Log shippers force-killed → gaps in observability → security events go unlogged during the exact window of a maintenance operation (worst possible time to lose visibility).
  2. CNI plugin agents force-killed → stale routing rules → new pods on the node get networking failures until the agent restarts and reconciles.
  3. Node-local metrics agents force-killed → alerting blind spots → SLO breach goes undetected.
  4. Storage drivers (e.g., node-plugin DaemonSets) force-killed mid-detach → volume attachment stuck in Terminating → downstream StatefulSet pods cannot reschedule.

The default terminationGracePeriodSeconds is 30 seconds — set by Kubernetes when you don't specify it. Most production log shippers with large buffers, TLS teardown, and flush cycles need 60–120 seconds minimum.


How to Fix It

Basic Fix — Increase the Grace Period

Identify your actual shutdown time first:

# Time how long your container takes to exit cleanly under SIGTERM
kubectl exec -it <pod> -- time kill -SIGTERM 1

# Check recent force-delete events
kubectl get events --field-selector reason=Killing -A | grep -i exceed
 apiVersion: apps/v1
 kind: DaemonSet
 metadata:
   name: fluentd
 spec:
   template:
     spec:
-      terminationGracePeriodSeconds: 30
+      terminationGracePeriodSeconds: 90
       containers:
       - name: fluentd
         image: fluent/fluentd:v1.16

Enterprise Best Practice — preStop Hook + Signal Handling

Increasing the number alone is not sufficient. The container process must handle SIGTERM. Many JVM-based and Go-based agents ignore SIGTERM when PID 1 is a shell wrapper. Add a preStop hook to buy deterministic drain time regardless of signal handling quality.

 apiVersion: apps/v1
 kind: DaemonSet
 metadata:
   name: fluentd
 spec:
   updateStrategy:
     type: RollingUpdate
     rollingUpdate:
+      maxUnavailable: 1
   template:
     spec:
-      terminationGracePeriodSeconds: 30
+      terminationGracePeriodSeconds: 120
       containers:
       - name: fluentd
         image: fluent/fluentd:v1.16
+        lifecycle:
+          preStop:
+            exec:
+              command: ["/bin/sh", "-c", "sleep 5 && kill -SIGTERM $(pgrep -f fluentd) && sleep 10"]
+        resources:
+          requests:
+            cpu: "100m"
+            memory: "200Mi"
+          limits:
+            memory: "400Mi"
       volumes:
       - name: varlog
         hostPath:
           path: /var/log

Why preStop + sleep: Kubernetes sends SIGTERM AND starts the grace period countdown simultaneously. The preStop hook runs before SIGTERM and its duration counts against terminationGracePeriodSeconds. The sleep 5 gives the load balancer / kube-proxy time to remove the pod from endpoints before the process starts shutting down — critical for DaemonSet pods that also serve local traffic.

Critical rule: preStop duration + actual shutdown time must be less than terminationGracePeriodSeconds. Leave at least 10 seconds of headroom.


💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.


Prevention in CI/CD

OPA / Gatekeeper Policy

Enforce a minimum grace period on all DaemonSet objects at admission time:

package daemonset.termination

violation[{"msg": msg}] {
  input.review.object.kind == "DaemonSet"
  grace := input.review.object.spec.template.spec.terminationGracePeriodSeconds
  grace < 60
  msg := sprintf("DaemonSet '%v' terminationGracePeriodSeconds is %v, minimum is 60", [
    input.review.object.metadata.name, grace
  ])
}

violation[{"msg": msg}] {
  input.review.object.kind == "DaemonSet"
  not input.review.object.spec.template.spec.terminationGracePeriodSeconds
  msg := sprintf("DaemonSet '%v' does not set terminationGracePeriodSeconds (defaults to 30, too low)", [
    input.review.object.metadata.name
  ])
}

Checkov Static Analysis

# Add to your CI pipeline
checkov -f daemonset.yaml --check CKV_K8S_30
# CKV_K8S_30 flags missing/low liveness and termination configs

Checkov Custom Check (if built-in is insufficient)

# .checkov/custom_checks/daemonset_grace_period.py
from checkov.common.models.enums import CheckResult
from checkov.kubernetes.checks.resource.base_spec_check import BaseK8Check

class DaemonSetGracePeriodCheck(BaseK8Check):
    def __init__(self):
        name = "Ensure DaemonSet terminationGracePeriodSeconds >= 60"
        id = "CKV2_K8S_CUSTOM_001"
        super().__init__(name=name, id=id, supported_entities=['DaemonSet'])

    def scan_spec_conf(self, conf):
        grace = conf.get("spec", {}).get("template", {}).get("spec", {}).get("terminationGracePeriodSeconds", 30)
        return CheckResult.PASSED if grace >= 60 else CheckResult.FAILED

Monitoring Alert (Prometheus)

# Alert fires before the problem causes an outage
- alert: DaemonSetPodForceKilled
  expr: |
    increase(kubelet_pod_worker_duration_seconds_count{
      operation_type="sync",
      pod=~".*"
    }[5m]) > 0
    and on(pod, namespace)
    kube_pod_status_reason{reason="Evicted"} == 1
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: "Pod {{ $labels.pod }} force-killed on {{ $labels.node }}"
    description: "Check terminationGracePeriodSeconds and preStop hooks."

Runbook checklist for every DaemonSet deployment:

  • terminationGracePeriodSeconds explicitly set and ≥ 60
  • preStop hook defined
  • Application process is PID 1 or uses tini/dumb-init as init
  • SIGTERM handler tested under load
  • OPA/Gatekeeper policy enforced in cluster
  • Checkov check passing in CI

Related Diagnostics

"Part of the Performance Utility Matrix."

View all 219 Performance Tools →