Improving SLOs

This page provides high-level ideas on how the SLOs managed by this component can be improved over time. Providing detailed step-by-step instructions is a non-goal of this page.

This component provides a number of SLOs, which may need to be improved or tuned for various reasons. Clusters may have special SLAs, which can’t be met with the default SLO configuration. Depending on the specific circumstances, SLOs managed by this component may generate non-actionable alerts.

Global modifications of the SLOs must be evaluated carefully. Changes to this component require a mandatory review before they can be merged.

Special SLA

In case of special SLAs, the relevant SLO configurations should be updated to have an objective which satisfies the SLA. Ideally, the objective configured for the SLO should be higher than the objective documented in the SLA. Having a SLO which is stricter than the SLA reduces the number of SLA violations, because engineers are notified about issues after the SLO objective is violated. By notifying engineers early, the issue may get resolved before the SLA objective gets violated.

See section " Tune" in the runbook of an SLO for detailed instructions on how to tune the configuration for that SLO.

Noisy SLO

SLOs managed by this component may generate a lot of alerts. If those alerts aren’t actionable, you should determine how you can adjust the SLO to not generate these alerts. Generally, you may want to adjust the scope of the Prometheus queries used by the SLO to omit the components which cause the non-actionable alerts.

See section " Steps for debugging" in the runbook of an SLO for detailed information about how to investigate alerts for that SLO.

Other cases

Generally, reading the SLO runbook should provide a good starting point to understand an SLO.

Ideally, the SLO runbooks should be improved and adjusted as necessary, if there are new findings after handling an SLO alert. This doesn’t need to happen while the SLO alert is active, but should be done shortly after, so that the small details which otherwise are forgotten are documented for future alert.