Parameters
The parent key for all of the following parameters is openshift4_slos.
namespace
| type |
string |
| default |
|
The default namespace for ArgoCD to fall back to.
images.sloth
| type |
dictionary |
Sloth isn’t actually deployed to the cluster, but used to render PrometheusRules.
The entry in images allows Renovate to create version upgrade PRs.
The Sloth version can be overridden by the tag parameter.
secrets
| type |
dictionary |
| default |
|
| example |
|
This parameter allows creating arbitrary Secret resources.
The dictionary keys are used as metadata.name for the resulting Secret resources.
The secrets are created in the namespace indicated by parameter namespace.
slos
| type |
dictionary |
The configuration option of all default SLOs for the APPUiO Managed OpenShift product.
slos.customer-facing.ingress
| type |
dictionary |
| default |
|
The configuration for a customer-facing ingress SLO.
The SLO can be disabled by setting enabled to false.
| Look at the runbook for an explanation of this SLO. |
slos.storage.csi-operations
| type |
dictionary |
| default |
|
The configuration for the csi-operations storage SLO.
The SLO can be disabled by setting enabled to false.
You can configure which volume plugins or storage operations are considered for the SLO by setting _sli.volume_plugin or _sli.operation_name respectively.
The fields can contain an arbitrary PromQL regex label matcher.
Any additional field is added directly to the slo input for sloth.
| Look at the runbook for an explanation of this SLO. |
slos.storage.canary
| type |
dictionary |
| default |
|
| example |
|
The configuration for the canary storage SLO.
The SLO can be disabled by setting enabled to false.
The canary SLO is tested by creating a PVC for every configured storage class and periodically running a pod that writes and deletes a file on the respective PVC.
You can configure which volume plugins are tested with _sli.volume_plugins.
The key is the storage class name and the value is a dictionary which can override the default parameters set in volume_plugins_default_params.
An empty key ("") is used for the default storage class.
The value can be set to null to disable the canary for a specific storage class.
Any additional field is added directly to the slo input for sloth.
| Look at the runbook for an explanation of this SLO. |
slos.kubernetes_api.requests
| type |
dictionary |
| default |
|
The configuration for the kubernetes API requests SLO.
The SLO can be disabled by setting enabled to false.
You can configure which API servers are actually considered for the SLO by setting _sli.apiserver.
By default the SLO only consideres the Kubernetes API server and not the OpenShift API server.
The field can contain an arbitrary PromQL regex label matcher.
Any additional field is added directly to the slo input for sloth.
| Look at the runbook for an explanation of this SLO. |
slos.kubernetes_api.canary
| type |
dictionary |
| default |
|
The configuration for the kubernetes API canary SLO.
The SLO can be disabled by setting enabled to false.
You can configure the probe interval and timeout by setting _sli.interval and _sli.probe respectively.
Both parameters are in Go duration format (for example 1m30s).
Any additional field is added directly to the slo input for sloth.
| Look at the runbook for an explanation of this SLO. |
slos.workload-schedulability.canary
| type |
dictionary |
| default |
|
The configuration for the canary based workload schedulability SLO.
The SLO can be disabled by setting enabled to false.
You can configure the interval canary pods are created (podStartInterval) and the timeout until a pod is seen as stuck (overallPodTimeout).
Both parameters are in Go duration format (for example 1m30s).
Any additional field is added directly to the slo input for sloth.
| Look at the runbook for an explanation of this SLO. |
slos.network.canary
| type |
dictionary |
| default |
|
The configuration for the canary based network SLO, measuring packet loss between nodes.
The SLO can be disabled by setting enabled to false.
Any additional field is added directly to the slo input for sloth.
| Look at the runbook for an explanation of this SLO. |
alerting
| type |
dictionary |
Common alerting configuration for all deployed SLOs.
alerting.labels
| type |
dictionary |
| default |
|
Labels that are added to all Prometheus alerts generated by this component.
specs
| type |
dictionary |
| default |
|
The SLO definition that are passed to Sloth. The key is used as the name of the resulting PrometheusRule. It must be a valid Kubernetes name.
specs.NAME.metadata
| type |
dictionary |
| example |
|
The metadata applied to the PrometheusRule manifest. The name is derived from the name of the parent dictionary.
specs.NAME.sloth_input
| type |
dictionary |
| example |
appuio-ch-http-get-availability:
sloth_input:
version: "prometheus/v1"
service: "appuio-ch"
labels:
owner: "myteam"
_slos:
# We allow failing (5xx and 429) 1 request every 1000 requests (99.9%).
appuio-ch-http-get-availability:
enabled: true (1)
objective: 99.9
description: "SLO based on availability for blackbox HTTP GET request."
sli:
raw:
error_ratio_query: |
1 - (
sum_over_time(probe_success{instance="https://www.appuio.ch/"}[{{.window}}])
/
count_over_time(up{instance="https://www.appuio.ch/"}[{{.window}}])
)
alerting:
name: AppuioChHttpGetErrorRatio
labels:
category: "availability"
annotations:
# Overwrite default Sloth SLO alert summmary on ticket and page alerts.
summary: "High error rate on 'appuio.ch' responses"
page_alert:
labels:
severity: warning
ticket_alert:
labels:
severity: warning
routing_key: myteam
| 1 | enabled is an optional field that allows users to disable certain SLOs through the hierarchy.
The field will default to true if omitted. |
The input for sloth to generate the PrometheusRule.spec.
See Sloth introduction for more information.
The slos can be passed as either an array or as a dictionary with the key _slos.
This is done to allow easier modification of the SLOs from the Project Syn hierarchy.
controller_node_affinity
| type |
dict |
| default |
|
This parameter is used to configure spec.affinity.nodeAffinity for the blackbox-exporter and scheduler-canary-controller deployments.
We default to scheduling the blackbox-exporter and scheduler-canary-controller on the infra nodes.
To customize the node affinity for those deployments, please use reclass’s overwrite mechanism by using key ~controller_node_affinity, since otherwise your changes will most likely be appended to the component defaults.
canary_node_affinity
| type |
dict |
| default |
|
This parameter can be used to configure spec.affinity.nodeAffinity for the SchedulerCanary custom resource generated by the component.
| We don’t recommend adjusting this parameter unless the component is installed on a cluster that has all-in-one nodes. |
blackbox_exporter
| type |
dictionary |
blackbox_exporter allows setting up a optional Blackbox exporter.
blackbox_exporter.enabled
| type |
boolean |
| default |
|
Controls whether the Blackbox exporter is deployed.
blackbox_exporter.name
| type |
string |
| default |
|
The name of the Blackbox exporter deployment.
blackbox_exporter.namespace
| type |
string |
| default |
|
The namespace of the Blackbox exporter deployment.
blackbox_exporter.deployment.resources
| type |
dictionary |
| default |
The resources to use for the Blackbox exporter deployment.
blackbox_exporter.deployment.affinity
| type |
dictionary |
| default |
|
Affinity rules for the Blackbox exporter deployment.
Schedules replicas on different nodes. This is done to avoid SLO violations when rebooting a worker node.
blackbox_exporter.deployment.replicas
| type |
integer |
| default |
|
The number of replicas for the Blackbox exporter deployment. Defaults to 2 to avoid SLO violations when rebooting a worker node.
blackbox_exporter.deployment.podDisruptionBudget
| type |
dictionary |
| default |
|
The PodDisruptionBudget for the Blackbox exporter deployment. Ensures at least one replica is available at all times.
blackbox_exporter.config
| type |
dictionary |
| default |
The blackbox exporter configuration. See Configuration for more information.
blackbox_exporter.probes
| type |
dictionary |
| default |
|
| example |
|
The Probe definitions that are deployed in the cluster and picked up by the blackbox exporter managed by the component. The key is used as the name of the resulting Probe. It must be a valid Kubernetes name.
The .spec.prober part is automatically filled from the Blackbox exporter configuration and can omitted.
canary_scheduler_controller
| type |
dictionary |
canary_scheduler_controller allows setting up the canary controller to test workload schedulability.
The manifests are rendered using Kustomize.
canary_scheduler_controller.enabled
| type |
boolean |
| default |
|
Controls whether the controller is deployed.
canary_scheduler_controller.manifests_version
| type |
string |
| default |
|
The Git reference to the canary controller manifests. The default is the tag of the canary controller image.
canary_scheduler_controller.kustomize_input
| type |
dictionary |
| default |
|
The input passed to the Kustomize renderer. See The Kustomization File for all available options.
network_canary
| type |
dictionary |
network_canary allows configuring the network canary used for measuring packet loss for network SLO.
network_canary.enabled:
| type |
boolean |
| default |
|
Whether the canary should be deployed. By default the component will deploy the canary if and only if the network canary SLO is enabled.
network_canary.namespace
| type |
string |
| default |
|
In which namespace the network canary should be deployed.
INFO: This needs to differ from the default SLO namespace so that we can choose different node selectors for the canary.
network_canary.nodeselector
| type |
string |
| default |
|
On which nodes the canary should be deployed on. By default the network canary will run on all worker nodes.
network_canary.resources
| type |
dictionary |
| default |
|
The resource requests and limits for the network canary.
network_canary.tolerations
| type |
dictionary |
| default |
|
The tolerations for the network canary daemonset. The values of the dictionary will be passed as is to the manifest.
Example
namespace: appuio-openshift4-slos
specs:
appuio-ch-http-get-availability:
sloth_input:
version: "prometheus/v1"
service: "appuio-ch"
labels:
owner: "myteam"
_slos:
# We allow failing (5xx and 429) 1 request every 1000 requests (99.9%).
appuio-ch-http-get-availability:
objective: 99.9
description: "SLO based on availability for blackbox HTTP GET request."
sli:
raw:
error_ratio_query: |
1 - (
sum_over_time(probe_success{instance="https://www.appuio.ch/"}[{{.window}}])
/
count_over_time(up{instance="https://www.appuio.ch/"}[{{.window}}])
)
alerting:
name: AppuioChHttpGetErrorRatio
labels:
category: "availability"
annotations:
# Overwrite default Sloth SLO alert summmary on ticket and page alerts.
summary: "High error rate on 'appuio.ch' responses"
page_alert:
labels:
severity: warning
ticket_alert:
labels:
severity: warning
routing_key: myteam
blackbox_exporter:
probes:
http-appuio-ch:
spec:
jobName: get-http-appuio-ch
interval: 15s
module: http_2xx
targets:
staticConfig:
static:
- https://www.appuio.ch/