Alert rule: CephMonDiskspaceCritical

Please consider opening a PR to improve this runbook if you gain new information about causes of the alert, or how to debug or resolve the alert. Click "Edit this Page" in the top right corner to create a PR directly on GitHub.

Overview

The free space available to a monitor’s store is critically low (<5% by default). You should increase the space available to the monitor(s). The monitor’s store can be found at /var/lib/rook/<MON_NAME>/data on the host.

Look for old, rotated versions of .log and MANIFEST. IMPORTANT: Don’t touch any *.sst files. Also check any other directories under /var/lib/rook and other directories on the same filesystem, often /var/log and /var/tmp are culprits.

Steps for debugging

Check node low on disk space

$ ceph_cluster_ns=syn-rook-ceph-cluster
$ ceph_mon_name=a (1)
$ kubectl -n ${ceph_cluster_ns} get deploy -lapp=rook-ceph-mon,mon=${ceph_mon_name}  -ojson | jq '.items[].spec.template.spec.nodeSelector'
{
  "kubernetes.io/hostname": "storage-XXXX" (2)
}

1	The name of the monitor that’s alerting.
2	The node that the monitor is running on.

Increase the size of the node’s disk

Increase the size of the node’s disk according to your cloud provider’s documentation.

Upstream documentation

docs.ceph.com/en/latest/rados/operations/health-checks#mon-disk-crit