Alert rule: CephOSDTimeoutsClusterNetwork

Please consider opening a PR to improve this runbook if you gain new information about causes of the alert, or how to debug or resolve the alert. Click "Edit this Page" in the top right corner to create a PR directly on GitHub.

Overview

OSD heartbeats on the cluster’s 'cluster' network are running slow. Investigate the network for any latency issues on this subnet.

Steps for debugging

$ ceph_cluster_ns=syn-rook-ceph-cluster
$ kubectl -n ${ceph_cluster_ns} exec -it deploy/rook-ceph-tools -- ceph status
  cluster:
    id:     92716509-0f84-4739-8d04-541d2e7c3e66
    health: HEALTH_WARN (1)
            [ ... detailed information ... ] (2)
            [ ... detailed information ... ] (2)
            [ ... detailed information ... ] (2)

[ ... remaining output omitted ... ]
1 General cluster health status
2 One or more lines of information giving details why the cluster state is degraded. Only available if the cluster health isn’t HEALTH_OK.