Alert rule: RookCephOperatorScaledDown
Please consider opening a PR to improve this runbook if you gain new information about causes of the alert, or how to debug or resolve the alert. Click "Edit this Page" in the top right corner to create a PR directly on GitHub.
This alert fires if the Rook-Ceph operator deployment is scaled to 0 for more than an hour. While the operator is scaled to 0, the Ceph cluster isn’t actively managed and could start degrading.
Steps for debugging
Check if the
rook-ceph ArgoCD app is synced and healthy
$ kubectl -n syn get app rook-ceph NAME SYNC STATUS HEALTH STATUS rook-ceph Synced Healthy
If the output of the
kubectl command indicates that the app isn’t synced and healthy, check the app in ArgoCD.
You can use the
argocd CLI or the web interface to do so.
Check configured replicas in cluster catalog
Verify that the operator deployment manifest in the cluster catalog specifies
.spec.replicas=1 by inspecting the cluster catalog.
The cluster catalog is linked in column "GitRepo URL" on control.vshn.net.
The operator deployment manifests can be found in