Parameters
The parent key for all of the following parameters is openshift4_slos
.
namespace
type |
string |
default |
|
The default namespace for ArgoCD to fall back to.
images.sloth
type |
dictionary |
Sloth isn’t actually deployed to the cluster, but used to render PrometheusRules
.
The entry in images
allows Renovate to create version upgrade PRs.
The Sloth version can be overridden by the tag
parameter.
secrets
type |
dictionary |
default |
|
example |
|
This parameter allows creating arbitrary Secret
resources.
The dictionary keys are used as metadata.name
for the resulting Secret
resources.
The secrets are created in the namespace indicated by parameter namespace
.
slos
type |
dictionary |
The configuration option of all default SLOs for the APPUiO Managed OpenShift product.
slos.storage.csi-operations
type |
dictionary |
default |
|
The configuration for the csi-operations storage SLO.
The SLO can be disabled by setting enabled
to false.
You can configure which volume plugins or storage operations are considered for the SLO by setting _sli.volume_plugin
or _sli.operation_name
respectively.
The fields can contain an arbitrary PromQL regex label matcher.
Any additional field is added directly to the slo
input for sloth.
Look at the runbook for an explanation of this SLO. |
slos.storage.canary
type |
dictionary |
default |
|
example |
|
The configuration for the canary storage SLO.
The SLO can be disabled by setting enabled
to false.
The canary SLO is tested by creating a PVC for every configured storage class and periodically running a pod that writes and deletes a file on the respective PVC.
You can configure which volume plugins are tested with _sli.volume_plugins
.
The key is the storage class name and the value is a dictionary which can override the default parameters set in volume_plugins_default_params
.
An empty key (""
) is used for the default storage class.
The value can be set to null
to disable the canary for a specific storage class.
Any additional field is added directly to the slo
input for sloth.
Look at the runbook for an explanation of this SLO. |
slos.kubernetes_api.requests
type |
dictionary |
default |
|
The configuration for the kubernetes API requests SLO.
The SLO can be disabled by setting enabled
to false.
You can configure which API servers are actually considered for the SLO by setting _sli.apiserver
.
By default the SLO only consideres the Kubernetes API server and not the OpenShift API server.
The field can contain an arbitrary PromQL regex label matcher.
Any additional field is added directly to the slo
input for sloth.
Look at the runbook for an explanation of this SLO. |
slos.kubernetes_api.canary
type |
dictionary |
default |
|
The configuration for the kubernetes API canary SLO.
The SLO can be disabled by setting enabled
to false.
You can configure the probe interval and timeout by setting _sli.interval
and _sli.probe
respectively.
Both parameters are in Go duration format (for example 1m30s
).
Any additional field is added directly to the slo
input for sloth.
Look at the runbook for an explanation of this SLO. |
slos.workload-schedulability.canary
type |
dictionary |
default |
|
The configuration for the canary based workload schedulability SLO.
The SLO can be disabled by setting enabled
to false.
You can configure the interval canary pods are created (podStartInterval
) and the timeout until a pod is seen as stuck (overallPodTimeout
).
Both parameters are in Go duration format (for example 1m30s
).
Any additional field is added directly to the slo
input for sloth.
Look at the runbook for an explanation of this SLO. |
slos.network.canary
type |
dictionary |
default |
|
The configuration for the canary based network SLO, measuring packet loss between nodes.
The SLO can be disabled by setting enabled
to false.
Any additional field is added directly to the slo
input for sloth.
Look at the runbook for an explanation of this SLO. |
alerting
type |
dictionary |
Common alerting configuration for all deployed SLOs.
alerting.labels
type |
dictionary |
default |
|
Labels that are added to all Prometheus alerts generated by this component.
specs
type |
dictionary |
default |
|
The SLO definition that are passed to Sloth. The key is used as the name of the resulting PrometheusRule. It must be a valid Kubernetes name.
specs.NAME.metadata
type |
dictionary |
example |
|
The metadata applied to the PrometheusRule manifest. The name is derived from the name of the parent dictionary.
specs.NAME.sloth_input
type |
dictionary |
example |
appuio-ch-http-get-availability:
sloth_input:
version: "prometheus/v1"
service: "appuio-ch"
labels:
owner: "myteam"
_slos:
# We allow failing (5xx and 429) 1 request every 1000 requests (99.9%).
appuio-ch-http-get-availability:
enabled: true (1)
objective: 99.9
description: "SLO based on availability for blackbox HTTP GET request."
sli:
raw:
error_ratio_query: |
1 - (
sum_over_time(probe_success{instance="https://www.appuio.ch/"}[{{.window}}])
/
count_over_time(up{instance="https://www.appuio.ch/"}[{{.window}}])
)
alerting:
name: AppuioChHttpGetErrorRatio
labels:
category: "availability"
annotations:
# Overwrite default Sloth SLO alert summmary on ticket and page alerts.
summary: "High error rate on 'appuio.ch' responses"
page_alert:
labels:
severity: warning
ticket_alert:
labels:
severity: warning
routing_key: myteam
1 | enabled is an optional field that allows users to disable certain SLOs through the hierarchy.
The field will default to true if omitted. |
The input for sloth to generate the PrometheusRule.spec
.
See Sloth introduction for more information.
The slos
can be passed as either an array or as a dictionary with the key _slos
.
This is done to allow easier modification of the SLOs from the Project Syn hierarchy.
controller_node_affinity
type |
dict |
default |
|
This parameter is used to configure spec.affinity.nodeAffinity
for the blackbox-exporter and scheduler-canary-controller deployments.
We default to scheduling the blackbox-exporter and scheduler-canary-controller on the infra nodes.
To customize the node affinity for those deployments, please use reclass’s overwrite mechanism by using key ~controller_node_affinity
, since otherwise your changes will most likely be appended to the component defaults.
canary_node_affinity
type |
dict |
default |
|
This parameter can be used to configure spec.affinity.nodeAffinity
for the SchedulerCanary
custom resource generated by the component.
We don’t recommend adjusting this parameter unless the component is installed on a cluster that has all-in-one nodes. |
blackbox_exporter
type |
dictionary |
blackbox_exporter
allows setting up a optional Blackbox exporter.
blackbox_exporter.enabled
type |
boolean |
default |
|
Controls whether the Blackbox exporter is deployed.
blackbox_exporter.name
type |
string |
default |
|
The name of the Blackbox exporter deployment.
blackbox_exporter.namespace
type |
string |
default |
|
The namespace of the Blackbox exporter deployment.
blackbox_exporter.deployment.resources
type |
dictionary |
default |
The resources to use for the Blackbox exporter deployment.
blackbox_exporter.deployment.affinity
type |
dictionary |
default |
|
Affinity rules for the Blackbox exporter deployment.
Schedules replicas on different nodes. This is done to avoid SLO violations when rebooting a worker node.
blackbox_exporter.deployment.replicas
type |
integer |
default |
|
The number of replicas for the Blackbox exporter deployment. Defaults to 2 to avoid SLO violations when rebooting a worker node.
blackbox_exporter.deployment.podDisruptionBudget
type |
dictionary |
default |
|
The PodDisruptionBudget for the Blackbox exporter deployment. Ensures at least one replica is available at all times.
blackbox_exporter.config
type |
dictionary |
default |
The blackbox exporter configuration. See Configuration for more information.
blackbox_exporter.probes
type |
dictionary |
default |
|
example |
|
The Probe definitions that are deployed in the cluster and picked up by the blackbox exporter managed by the component. The key is used as the name of the resulting Probe. It must be a valid Kubernetes name.
The .spec.prober
part is automatically filled from the Blackbox exporter configuration and can omitted.
canary_scheduler_controller
type |
dictionary |
canary_scheduler_controller
allows setting up the canary controller to test workload schedulability.
The manifests are rendered using Kustomize.
canary_scheduler_controller.enabled
type |
boolean |
default |
|
Controls whether the controller is deployed.
canary_scheduler_controller.manifests_version
type |
string |
default |
|
The Git reference to the canary controller manifests. The default is the tag of the canary controller image.
canary_scheduler_controller.kustomize_input
type |
dictionary |
default |
|
The input passed to the Kustomize renderer. See The Kustomization File for all available options.
network_canary
type |
dictionary |
network_canary
allows configuring the network canary used for measuring packet loss for network SLO.
network_canary.enabled
:
type |
boolean |
default |
|
Whether the canary should be deployed. By default the component will deploy the canary if and only if the network canary SLO is enabled.
network_canary.namespace
type |
string |
default |
|
In which namespace the network canary should be deployed.
INFO: This needs to differ from the default SLO namespace so that we can choose different node selectors for the canary.
network_canary.nodeselector
type |
string |
default |
|
On which nodes the canary should be deployed on. By default the network canary will run on all worker nodes.
network_canary.resources
type |
dictionary |
default |
|
The resource requests and limits for the network canary.
network_canary.tolerations
type |
dictionary |
default |
|
The tolerations for the network canary daemonset. The values of the dictionary will be passed as is to the manifest.
Example
namespace: appuio-openshift4-slos
specs:
appuio-ch-http-get-availability:
sloth_input:
version: "prometheus/v1"
service: "appuio-ch"
labels:
owner: "myteam"
_slos:
# We allow failing (5xx and 429) 1 request every 1000 requests (99.9%).
appuio-ch-http-get-availability:
objective: 99.9
description: "SLO based on availability for blackbox HTTP GET request."
sli:
raw:
error_ratio_query: |
1 - (
sum_over_time(probe_success{instance="https://www.appuio.ch/"}[{{.window}}])
/
count_over_time(up{instance="https://www.appuio.ch/"}[{{.window}}])
)
alerting:
name: AppuioChHttpGetErrorRatio
labels:
category: "availability"
annotations:
# Overwrite default Sloth SLO alert summmary on ticket and page alerts.
summary: "High error rate on 'appuio.ch' responses"
page_alert:
labels:
severity: warning
ticket_alert:
labels:
severity: warning
routing_key: myteam
blackbox_exporter:
probes:
http-appuio-ch:
spec:
jobName: get-http-appuio-ch
interval: 15s
module: http_2xx
targets:
staticConfig:
static:
- https://www.appuio.ch/