CiliumAgentUnexpectedCount
|
Please consider opening a PR to improve this runbook if you gain new information about causes of the alert, or how to debug or resolve the alert. Click "Edit this Page" in the top right corner to create a PR directly on GitHub. |
Overview
This alert fires if a cluster has a mismatch between the number of Cilium agent pods and nodes for more than 5 minutes.
This usually indicates that the Cilium agent DaemonSet is misconfigured or missing.
Steps for debugging
-
Check the Cilium agent
DaemonSetkubectl -n cilium get ds -l app.kubernetes.io/name=cilium-agent-
If the
DaemonSetis missing, check the OLM operator logs (or Helm deployment status)kubectl -n cilium logs deployment/clife-controller-manager (1)1 For Cilium 1.16, use deployment/cilium-ee-olm. -
If the
DaemonSetis present, find nodes which don’t have a Cilium agent pod and check them for scheduling issues or untolerated taintsfor node in $(kubectl get nodes -oname); do pod=$(kubectl -oname -n cilium get pods -l app.kubernetes.io/name=cilium-agent --field-selector spec.nodeName="${node#node/}") if [ "${pod}" == "" ]; then echo "Cilium agent missing on node ${node}" fi done
-