CiliumBpfMapPressureExtremelyHigh
Please consider opening a PR to improve this runbook if you gain new information about causes of the alert, or how to debug or resolve the alert. Click "Edit this Page" in the top right corner to create a PR directly on GitHub. |
Overview
This alert fires if any BPF maps on a node are at >=90% utilization for 10 minutes or longer. Depending on the map for which the alert fires this can indicate many different things. See below for some cases that we’ve encountered so far.
Known maps
Please update this section if you encounter this alert for a map which isn’t listed yet. |
cilium_policy_*
-
This is the eBPF map which contains endpoint policy configurations. Endpoint policy configurations are created from network policies in the cluster. If this map fills up completely or if there’s a high error rate for operations on this map, this can severely impact traffic on the cluster, since endpoints for which the policy map cannot be configured may not work correctly.
Steps for debugging
Prerequisites
-
cilium
CLI, install from cilium/cilium-cli -
kubectl
Add a section below if you’re debugging a map for which there’s no info yet. |
Investigate cilium_policy_*
-
List policies
NODE=<node name of affected node> (1) AGENT_POD=$(kubectl -n cilium get pods --field-selector=spec.nodeName=$NODE \ -l app.kubernetes.io/name=cilium-agent -oname) kubectl -n cilium exec -it $AGENT_POD --as=cluster-admin -- cilium-dbg policy selectors (2)
1 The node indicated in the alert 2 --as=cluster-admin
is required on VSHN managed clusters3 List the Cilium policy selectors (including matched endpoint IDs) that need to be deployed on the node. -
Check output for any policies that match a large amount of endpoints and investigate if you can tune the associated network policy to reduce the amount of matched endpoints.
See the upstream troubleshooting documentation for more details on this map: Cilium OSS — Policy map pressure and overflow.