Parameters

The parent key for all of the following parameters is rancher_monitoring.

`namespace`

type	string
default	`syn-rancher-monitoring`

The namespace in which to deploy the Syn Rancher monitoring stack

`kube_prometheus_version`

type	dict
default	`'1.15': release-0.3 '1.16': release-0.4 '1.17': release-0.4 '1.18': release-0.6 '1.19': release-0.7 '1.20': release-0.8 '1.21': release-0.8`

Map of kube-prometheus versions which are compatible with different Kubernetes versions.

See the kube-prometheus Kubernetes compatibility matrix for updating this map.

`cluster_kubernetes_version`

type	string
default	`1.18`

The target cluster’s Kubernetes version. Used to look up the kube-prometheus version in the kube_prometheus_version map.

Having this parameter as a layer of indirection allows setting the target cluster’s Kubernetes version without needing parameters.rancher_monitoring.kube_prometheus_version to exist.

We’re currently using this approach as a workaround for the fact that Commodore doesn’t support dynamic facts yet. Once dynamic facts are implemented, all clusters will have a uniformly named fact which represents the cluster’s Kubernetes version. That fact can then be used in place of this parameter to select the kube-prometheus version in jsonnetfile_parameters.kube_prometheus_version.

`federation_target`

type	string
default	`${rancher_monitoring:_federation_target_map:${rancher_monitoring:rancher_monitoring_version}}`

The service name of the Prometheus instance with which to federate. Usually this is the Prometheus instance managed by Rancher.

By default, this parameter is set based on the value of parameter rancher_monitoring_version. The default configuration ensures that this parameter is set to the default service name for the Rancher-managed Prometheus instance for Rancher monitoring V1 or V2 depending on the value of rancher_monitoring_version.

`rancher_monitoring_version`

type	string
default	`v2`

The version of the Rancher monitoring stack which is enabled on the cluster. Valid values are v1 or v2. See the Rancher documentation for more details on the differences between the V1 and V2 Rancher monitoring stacks. This parameter is used to configure an appropriate default value for the federation_target parameter, if that parameter isn’t overwritten.

`jsonnetfile_parameters`

type	dict
default	`kube_prometheus_version: ${rancher_monitoring:kube_prometheus_version:1.18}`

Map of string values to use as external Jsonnet variables when rendering jsonnetfile.json from the jsonnetfile.jsonnet in the repository.

The intent is that kube_prometheus_version is configured to a kube-prometheus version (Git tree-ish) which is compatible with the target cluster’s Kubernetes version.

`prometheusInstance`

type	string
default	`platform`

The Prometheus instance name to use. For some uses this will be prefixed with prometheus, for example for the StatefulSet name.

`prometheusNamespaceSelector`

type: string
default: null

This parameters allows to set a custom namespace selector for fields podMonitorNamespaceSelector, ruleNamespaceSelector and serviceMonitorNamespaceSelector. If this parameter isn’t set it will default to:

---
matchLabels: {
  SYNMonitoring: 'main',
},
---

`prometheus`

type	dict
default	`storage: volumeClaimTemplate: spec: storageClassName: fast resources: requests: storage: 10Gi`

Prometheus customizations. See the PrometheusSpec documentation for all possible configurations. The value of this parameter is merged into the spec field of the Prometheus object managed by the component.

`alertmanagerInstance`

type	string
default	`platform`

The Alertmanager instance name to use. For some uses this will be prefixed with alertmanager, for example for the StatefulSet name.

`alertmanager`

type	dict
default	`replicas: 1 logLevel: info`

Alertmanager customization. See the AlertmanagerSpec documentation for all possible configurations. The value of this parameter is merged into the spec field of the Alertmanager object managed by the component.

`alertmanagerConfig`

type

dict

default

receivers:
  # blackhole receiver for alerts we don't care about
  - name: devnull
# If the user doesn't configure a receiver, send everything to
# devnull. This allows Alertmanager to start even if the user doesn't
# configure any alert receivers.
route:
  receiver: devnull

The value of this parameter is deployed verbatim as the Alertmanager configuration in alertmanager.yaml.

See the Alertmanager documentation for possible configuration options.

The default configuration allows Alertmanager to start even if no alert receivers have been configured for the cluster.

`federation`

type	dict
default	`interval: 10s scrape_timeout: 10s extra_metric_relabel_configs: []`

Configure the scrape interval and timeout for the Prometheus job which federates metrics from the Rancher Prometheus instance in cattle-prometheus.

Users should ensure that the scrape_timeout is lower than the interval, as there’s no validation logic in the component.

extra_metric_relabel_configs allwos appending additional relabel configs to the federation job.

`alerts.namespaceSelector`

type	string
default	`namespace=~"default\|((kube\|syn\|cattle).*)"`

Namespace selector which is injected into alert rules by kube-prometheus (via kubernetes-mixin).

By default, alerts for namespaced objects are only configured for namespaces which are part of Kubernetes, Rancher, or Project Syn.

To fully remove the selector, set this parameter to null.

`alerts.ignoreNames`

type	list
default	`[]`

A list of alert names which should be completely disabled on the cluster.

Any alerts which match one of the names listed in ignoreNames are dropped from the final set of alert rules.

`alerts.customAnnotations`

type	dict
default	`{}`

Maps alert names to sets of custom annotations. Allows configuring custom annotations for individual alerts

Example:

customAnnotations:
  Watchdog:
    runbook_url: https://www.google.com/?q=Watchdog

`alerts.sharedStorageClass`

type	string
default	``

A regular expression that matches the shared storage classes in this cluster. A shared storage class is a storage class for which PVs share the same underlying volume, which causes them to fill up at the same rate. The component configures the alert rules to ensure that only a single alert is produced for storage classes matching this regex.

Users must ensure that the regex only matches storage classes which share a single backing volume. Otherwise volume utilization alerts will be lost.

Example:

sharedStorageClass: "bulk|foo.*"

`thanos`

type	dict
default	`{}`

This parameter allows to configure the object storage for the Prometheus Thanos sidecar containers. If this dict doesn’t have a key type, the Thanos sidecar container won’t be deployed. See the Official documentation for all possible configuration options.

Example:

---
thanos:
  type: S3
  config:
    bucket: my-bucket
    endpoint: my-s3.example.com
    access_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/thanos/access_key}
    secret_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/thanos/secret_key}
---

`rules`

type	dict
default	`{}`

This parameter allows users to configure additional Prometheus rules to deploy on the cluster.

Each key-value pair in the dictionary is transformed into a PrometheusRule object by the component.

The component expects that values are dicts themselves and expects that keys in those dicts are prefixed with record: or alert: to indicate whether the rule is a recording or alerting rule. The component will transform the keys into fields in the resulting rule by taking the prefix as the field name and the rest of the key as the field value. For example, key "record:sum:some:metric:5m" would be transformed into record: sum:some:metric:5m which should define a recording rule with name sum:some:metric:5m. This field is then merged into the provided value which should be a valid rule definition.

See the Prometheus docs for supported configurations for recording and alerting rules.

Example:

---
rules:
  generic-rules:
    "alert:ContainerOOMKilled":
      annotations:
        message: A container ({{$labels.container}}) in pod {{ $labels.namespace }}/{{ $labels.pod }} was OOM killed
      expr: |
        kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
      labels:
        source: https://git.vshn.net/swisscompks/syn-tenant-repo/-/blob/master/common.yml
        severity: devnull

Example

parameters:

  rancher_monitoring:
    # Dynamically adjust `kube-prometheus` version (Assumes a fact
    # `eks_version` containing the target cluster version as
    # `<major>.<minor>` exists.
    cluster_kubernetes_version: ${facts:eks_version}

    prometheus:
      replicas: 2
      requests:
        memory: 4Gi
        cpu: '2'
      limits:
        cpu: '4'
      storage:
        volumeClaimTemplate:
          spec:
            storageClassName: gp2
      thanos:
        resources:
          requests:
            memory: 2Gi
            cpu: '1'
          limits:
            memory: 4Gi
            cpu: '2'

    alertmanager:
      replicas: 3

    alertmanagerConfig:
      receivers:
        - name: my-super-receiver
          webhook_configs:
            - send_resolved: true
              http_config:
                bearer_token: thesecretbearertoken
              url: https://alert-receiver.example.com/alertmanager_webhook
      route:
        routes:
          # Disable KubePodCrashLooping and
          # KubeDeploymentReplicasMismatch in
          # all namespaces ending with `-dev`
          - receiver: devnull
            continue: false
            match_re
              alertname: '^(KubeDeploymentReplicasMismatch|KubePodCrashLooping)$'
              namespace: '-dev$'
        # Use receiver configured above as default
        receiver: my-super-receiver
    thanos:
      type: S3
      config:
        bucket: my-bucket
        endpoint: my-s3.example.com
        access_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/thanos/access_key}
        secret_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/thanos/secret_key}