Alert rule: CephPoolNearFull

Please consider opening a PR to improve this runbook if you gain new information about causes of the alert, or how to debug or resolve the alert. Click "Edit this Page" in the top right corner to create a PR directly on GitHub.

Overview

This alert fires when the Ceph cluster utilization is close to 75% of the cluster capacity. Writes may continue, but you are at risk of the pool going read only if more capacity isn’t made available.

To resolve this alert, unused data should be deleted or the cluster size must be increased.

Steps for debugging

First, identify the affected pool(s) by looking at the QUOTA BYTES and STORED values in the output of ceph df detail:

$ ceph_cluster_ns=syn-rook-ceph-cluster
$ kubectl -n ${ceph_cluster_ns} exec -it deploy/rook-ceph-tools -- ceph df detail

Increase the pool’s quota, if a quota configured. You can use ceph osd pool set quota <pool_name> max_bytes <bytes> or a similar command.

Additionally, check whether the balancer is active with ceph balancer status. If the balancer is off, check the cluster’s configuration for any notes on why the balancer is off. If there’s no documented reason for the balancer being off, turn it back on in automatic mode with ceph balancer on.

If the Ceph cluster is running out of space, see the how-to on scaling a PVC-based Ceph cluster for instructions to resize the cluster.