Parameters

The parent key for all of the following parameters is openshift4_slos.

`namespace`

type	string
default	`appuio-openshift4-slos`

The default namespace for ArgoCD to fall back to.

`images`

type	dictionary

The images to use for this component.

`images.sloth`

type	dictionary

Sloth isn’t actually deployed to the cluster, but used to render PrometheusRules.

The entry in images allows Renovate to create version upgrade PRs. The Sloth version can be overridden by the tag parameter.

`secrets`

type	dictionary
default	`{}`
example	`secrets: canary-ssd-encrypted-luks-key: stringData: luksKey: XXXXXX`

This parameter allows creating arbitrary Secret resources.

The dictionary keys are used as metadata.name for the resulting Secret resources. The secrets are created in the namespace indicated by parameter namespace.

`slos`

type	dictionary

The configuration option of all default SLOs for the APPUiO Managed OpenShift product.

`slos.customer-facing.ingress`

type	dictionary
default	`customer-facing: ingress: enabled: true objective: 99.9`

The configuration for a customer-facing ingress SLO.

The SLO can be disabled by setting enabled to false.

Look at the runbook for an explanation of this SLO.

`slos.storage.csi-operations`

type	dictionary
default	`csi-operations: enabled: true objective: 99.5 _sli: volume_plugin: "kubernetes.io/csi.+" operation_name: ".+"`

The configuration for the csi-operations storage SLO.

The SLO can be disabled by setting enabled to false.

You can configure which volume plugins or storage operations are considered for the SLO by setting _sli.volume_plugin or _sli.operation_name respectively. The fields can contain an arbitrary PromQL regex label matcher.

Any additional field is added directly to the slo input for sloth.

Look at the runbook for an explanation of this SLO.

`slos.storage.canary`

type

dictionary

default

canary:
  enabled: true
  objective: 99.0
  _sli:
    volume_plugins_default_params:
      size: 1Gi
      accessMode: ReadWriteOnce
      interval: 1m
      maxPodCompletionTimeout: 3m

    volume_plugins:
      # Empty value for the default plugin
      "": {}

example

canary:
  enabled: true
  objective: 99.0
  _sli:
    volume_plugins:
      # Disable the canary for the default storage class
      "": null
      # Enable the canaries for ssd and bulk storage classes
      ssd: {}
      bulk:
        size: 10Gi

The configuration for the canary storage SLO.

The SLO can be disabled by setting enabled to false.

The canary SLO is tested by creating a PVC for every configured storage class and periodically running a pod that writes and deletes a file on the respective PVC. You can configure which volume plugins are tested with _sli.volume_plugins. The key is the storage class name and the value is a dictionary which can override the default parameters set in volume_plugins_default_params. An empty key ("") is used for the default storage class. The value can be set to null to disable the canary for a specific storage class.

Any additional field is added directly to the slo input for sloth.

Look at the runbook for an explanation of this SLO.

`slos.kubernetes_api.requests`

type	dictionary
default	`requests: enabled: true objective: 99.9 _sli: apiserver: "kube-apiserver"`

The configuration for the kubernetes API requests SLO.

The SLO can be disabled by setting enabled to false.

You can configure which API servers are actually considered for the SLO by setting _sli.apiserver. By default the SLO only consideres the Kubernetes API server and not the OpenShift API server. The field can contain an arbitrary PromQL regex label matcher.

Any additional field is added directly to the slo input for sloth.

Look at the runbook for an explanation of this SLO.

`slos.kubernetes_api.canary`

type	dictionary
default	`canary: enabled: true objective: 99.9 _sli: interval: 10s timeout: 5s`

The configuration for the kubernetes API canary SLO.

The SLO can be disabled by setting enabled to false.

You can configure the probe interval and timeout by setting _sli.interval and _sli.probe respectively. Both parameters are in Go duration format (for example 1m30s).

Any additional field is added directly to the slo input for sloth.

Look at the runbook for an explanation of this SLO.

`slos.workload-schedulability.canary`

type	dictionary
default	`workload-schedulability: canary: enabled: true objective: 99.75 _sli: podStartInterval: 1m overallPodTimeout: 3m`

The configuration for the canary based workload schedulability SLO.

The SLO can be disabled by setting enabled to false.

You can configure the interval canary pods are created (podStartInterval) and the timeout until a pod is seen as stuck (overallPodTimeout). Both parameters are in Go duration format (for example 1m30s).

Any additional field is added directly to the slo input for sloth.

Look at the runbook for an explanation of this SLO.

`slos.network.canary`

type	dictionary
default	`network: canary: enabled: true objective: 99.95`

The configuration for the canary based network SLO, measuring packet loss between nodes.

The SLO can be disabled by setting enabled to false. Any additional field is added directly to the slo input for sloth.

Look at the runbook for an explanation of this SLO.

`alerting`

type	dictionary

Common alerting configuration for all deployed SLOs.

`alerting.labels`

type	dictionary
default	`labels: syn: "true" syn_component: "openshift4-slos"`

Labels that are added to all Prometheus alerts generated by this component.

`alerting.page_labels`

type	dictionary
default	`page_labels: severity: critical`

Labels that are added to all page Prometheus alerts generated by this component. page_alerts are alerts are critical alerts for a high burn rate that require immediate attention.

`alerting.ticket_labels`

type	dictionary
default	`ticket_labels: severity: warning`

Labels that are added to all ticket Prometheus alerts generated by this component. ticket_alerts are alerts are alerts for an elevated burn rate that might require attention, but aren’t urgent.

`specs`

type	dictionary
default	`{}`

The SLO definition that are passed to Sloth. The key is used as the name of the resulting PrometheusRule. It must be a valid Kubernetes name.

`specs.NAME.metadata`

type	dictionary
example	`metadata: namespace: my-important-service labels: prometheus: apps`

The metadata applied to the PrometheusRule manifest. The name is derived from the name of the parent dictionary.

`specs.NAME.sloth_input`

type	dictionary
example

appuio-ch-http-get-availability:
  sloth_input:
    version: "prometheus/v1"
    service: "appuio-ch"
    labels:
      owner: "myteam"
    _slos:
      # We allow failing (5xx and 429) 1 request every 1000 requests (99.9%).
      appuio-ch-http-get-availability:
        enabled: true (1)
        objective: 99.9
        description: "SLO based on availability for blackbox HTTP GET request."
        sli:
          raw:
            error_ratio_query: |
              1 - (
                  sum_over_time(probe_success{instance="https://www.appuio.ch/"}[{{.window}}])
                /
                  count_over_time(up{instance="https://www.appuio.ch/"}[{{.window}}])
              )
        alerting:
          name: AppuioChHttpGetErrorRatio
          labels:
            category: "availability"
          annotations:
            # Overwrite default Sloth SLO alert summmary on ticket and page alerts.
            summary: "High error rate on 'appuio.ch' responses"
          page_alert:
            labels:
              severity: warning
          ticket_alert:
            labels:
              severity: warning
              routing_key: myteam

1	`enabled` is an optional field that allows users to disable certain SLOs through the hierarchy. The field will default to `true` if omitted.

The input for sloth to generate the PrometheusRule.spec. See Sloth introduction for more information.

The slos can be passed as either an array or as a dictionary with the key _slos. This is done to allow easier modification of the SLOs from the Project Syn hierarchy.

`controller_node_affinity`

type	dict
default	`requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/infra operator: Exists`

This parameter is used to configure spec.affinity.nodeAffinity for the blackbox-exporter and scheduler-canary-controller deployments. We default to scheduling the blackbox-exporter and scheduler-canary-controller on the infra nodes.

To customize the node affinity for those deployments, please use reclass’s overwrite mechanism by using key ~controller_node_affinity, since otherwise your changes will most likely be appended to the component defaults.

`canary_node_affinity`

type	dict
default	`requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/app operator: Exists`

This parameter can be used to configure spec.affinity.nodeAffinity for the SchedulerCanary custom resource generated by the component.

We don’t recommend adjusting this parameter unless the component is installed on a cluster that has all-in-one nodes.

`blackbox_exporter`

type	dictionary

blackbox_exporter allows setting up a optional Blackbox exporter.

`blackbox_exporter.enabled`

type	boolean
default	`true`

Controls whether the Blackbox exporter is deployed.

`blackbox_exporter.name`

type	string
default	`prometheus-blackbox-exporter`

The name of the Blackbox exporter deployment.

`blackbox_exporter.namespace`

type	string
default	`${openshift4_slos:namespace}`

The namespace of the Blackbox exporter deployment.

`blackbox_exporter.deployment.resources`

type	dictionary
default	see class/defaults.yml

The resources to use for the Blackbox exporter deployment.

`blackbox_exporter.deployment.affinity`

type

dictionary

default

deployment:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution: []
      requiredDuringSchedulingIgnoredDuringExecution:
        - topologyKey: kubernetes.io/hostname
          labelSelector:
            matchExpressions:
              - key: name
                operator: In
                values:
                  - ${openshift4_slos:blackbox_exporter:name}

Affinity rules for the Blackbox exporter deployment.

Schedules replicas on different nodes. This is done to avoid SLO violations when rebooting a worker node.

`blackbox_exporter.deployment.replicas`

type	integer
default	`2`

The number of replicas for the Blackbox exporter deployment. Defaults to 2 to avoid SLO violations when rebooting a worker node.

`blackbox_exporter.deployment.podDisruptionBudget`

type	dictionary
default	`deployment: podDisruptionBudget: selector: matchLabels: name: ${openshift4_slos:blackbox_exporter:name} minAvailable: 1`

The PodDisruptionBudget for the Blackbox exporter deployment. Ensures at least one replica is available at all times.

`blackbox_exporter.config`

type	dictionary
default	see class/defaults.yml

The blackbox exporter configuration. See Configuration for more information.

`blackbox_exporter.probes`

type	dictionary
default	`{}`
example	`probes: http-appuio-ch: spec: jobName: get-http-appuio-ch interval: 15s module: http_2xx targets: staticConfig: static: - https://www.appuio.ch/`

The Probe definitions that are deployed in the cluster and picked up by the blackbox exporter managed by the component. The key is used as the name of the resulting Probe. It must be a valid Kubernetes name.

The .spec.prober part is automatically filled from the Blackbox exporter configuration and can omitted.

`canary_scheduler_controller`

type	dictionary

canary_scheduler_controller allows setting up the canary controller to test workload schedulability. The manifests are rendered using Kustomize.

`canary_scheduler_controller.enabled`

type	boolean
default	`true`

Controls whether the controller is deployed.

`canary_scheduler_controller.manifests_version`

type	string
default	`${openshift4_slos:images:canary_scheduler_controller:tag}`

The Git reference to the canary controller manifests. The default is the tag of the canary controller image.

`canary_scheduler_controller.kustomize_input`

type	dictionary
default	`kustomize_input: namespace: ${openshift4_slos:namespace}`

The input passed to the Kustomize renderer. See The Kustomization File for all available options.

`network_canary`

type	dictionary

network_canary allows configuring the network canary used for measuring packet loss for network SLO.

`network_canary.enabled`:

type	boolean
default	`${openshift4_slos:slos:network:canary:enabled}`

Whether the canary should be deployed. By default the component will deploy the canary if and only if the network canary SLO is enabled.

`network_canary.namespace`

type	string
default	`appuio-network-canary`

In which namespace the network canary should be deployed.

INFO: This needs to differ from the default SLO namespace so that we can choose different node selectors for the canary.

`network_canary.nodeselector`

type	string
default	`node-role.kubernetes.io/worker=`

On which nodes the canary should be deployed on. By default the network canary will run on all worker nodes.

`network_canary.resources`

type	dictionary
default	`resources: limits: memory: 40Mi requests: cpu: 1m memory: 20Mi`

The resource requests and limits for the network canary.

`network_canary.tolerations`

type	dictionary
default	`tolerations: infrastructure: effect: NoSchedule key: node-role.kubernetes.io/infra operator: Exists storage: key: 'storagenode' operator: 'Exists'`

The tolerations for the network canary daemonset. The values of the dictionary will be passed as is to the manifest.

See the upstream documentation on taints and tolerations.

Example

namespace: appuio-openshift4-slos

specs:
  appuio-ch-http-get-availability:
    sloth_input:
      version: "prometheus/v1"
      service: "appuio-ch"
      labels:
        owner: "myteam"
      _slos:
        # We allow failing (5xx and 429) 1 request every 1000 requests (99.9%).
        appuio-ch-http-get-availability:
          objective: 99.9
          description: "SLO based on availability for blackbox HTTP GET request."
          sli:
            raw:
              error_ratio_query: |
                1 - (
                    sum_over_time(probe_success{instance="https://www.appuio.ch/"}[{{.window}}])
                  /
                    count_over_time(up{instance="https://www.appuio.ch/"}[{{.window}}])
                )
          alerting:
            name: AppuioChHttpGetErrorRatio
            labels:
              category: "availability"
            annotations:
              # Overwrite default Sloth SLO alert summmary on ticket and page alerts.
              summary: "High error rate on 'appuio.ch' responses"
            page_alert:
              labels:
                severity: warning
            ticket_alert:
              labels:
                severity: warning
                routing_key: myteam

blackbox_exporter:
  probes:
    http-appuio-ch:
      spec:
        jobName: get-http-appuio-ch
        interval: 15s
        module: http_2xx
        targets:
          staticConfig:
            static:
              - https://www.appuio.ch/

Parameters

namespace

images

images.sloth

secrets

slos

slos.customer-facing.ingress

slos.storage.csi-operations

slos.storage.canary

slos.kubernetes_api.requests

slos.kubernetes_api.canary

slos.workload-schedulability.canary

slos.network.canary

alerting

alerting.labels

alerting.page_labels

alerting.ticket_labels

specs

specs.NAME.metadata

specs.NAME.sloth_input

controller_node_affinity

canary_node_affinity

blackbox_exporter

blackbox_exporter.enabled

blackbox_exporter.name

blackbox_exporter.namespace

blackbox_exporter.deployment.resources

blackbox_exporter.deployment.affinity

blackbox_exporter.deployment.replicas

blackbox_exporter.deployment.podDisruptionBudget

blackbox_exporter.config

blackbox_exporter.probes

canary_scheduler_controller

canary_scheduler_controller.enabled

canary_scheduler_controller.manifests_version

canary_scheduler_controller.kustomize_input

network_canary

network_canary.enabled:

network_canary.namespace

network_canary.nodeselector

network_canary.resources

network_canary.tolerations

Example

`namespace`

`images`

`images.sloth`

`secrets`

`slos`

`slos.customer-facing.ingress`

`slos.storage.csi-operations`

`slos.storage.canary`

`slos.kubernetes_api.requests`

`slos.kubernetes_api.canary`

`slos.workload-schedulability.canary`

`slos.network.canary`

`alerting`

`alerting.labels`

`alerting.page_labels`

`alerting.ticket_labels`

`specs`

`specs.NAME.metadata`

`specs.NAME.sloth_input`

`controller_node_affinity`

`canary_node_affinity`

`blackbox_exporter`

`blackbox_exporter.enabled`

`blackbox_exporter.name`

`blackbox_exporter.namespace`

`blackbox_exporter.deployment.resources`

`blackbox_exporter.deployment.affinity`

`blackbox_exporter.deployment.replicas`

`blackbox_exporter.deployment.podDisruptionBudget`

`blackbox_exporter.config`

`blackbox_exporter.probes`

`canary_scheduler_controller`

`canary_scheduler_controller.enabled`

`canary_scheduler_controller.manifests_version`

`canary_scheduler_controller.kustomize_input`

`network_canary`

`network_canary.enabled`:

`network_canary.namespace`

`network_canary.nodeselector`

`network_canary.resources`

`network_canary.tolerations`