Parameters

The parent key for all of the following parameters is rancher_monitoring.

namespace

type

string

default

syn-rancher-monitoring

The namespace in which to deploy the Syn Rancher monitoring stack

kube_prometheus_version

type

dict

default
'1.15': release-0.3
'1.16': release-0.4
'1.17': release-0.4
'1.18': release-0.6
'1.19': release-0.7
'1.20': release-0.8
'1.21': release-0.8

Map of kube-prometheus versions which are compatible with different Kubernetes versions.

See the kube-prometheus Kubernetes compatibility matrix for updating this map.

cluster_kubernetes_version

type

string

default

1.18

The target cluster’s Kubernetes version. Used to look up the kube-prometheus version in the kube_prometheus_version map.

Having this parameter as a layer of indirection allows setting the target cluster’s Kubernetes version without needing parameters.rancher_monitoring.kube_prometheus_version to exist.

We’re currently using this approach as a workaround for the fact that Commodore doesn’t support dynamic facts yet. Once dynamic facts are implemented, all clusters will have a uniformly named fact which represents the cluster’s Kubernetes version. That fact can then be used in place of this parameter to select the kube-prometheus version in jsonnetfile_parameters.kube_prometheus_version.

federation_target

type

string

default

${rancher_monitoring:_federation_target_map:${rancher_monitoring:rancher_monitoring_version}}

The service name of the Prometheus instance with which to federate. Usually this is the Prometheus instance managed by Rancher.

By default, this parameter is set based on the value of parameter rancher_monitoring_version. The default configuration ensures that this parameter is set to the default service name for the Rancher-managed Prometheus instance for Rancher monitoring V1 or V2 depending on the value of rancher_monitoring_version.

rancher_monitoring_version

type

string

default

v2

The version of the Rancher monitoring stack which is enabled on the cluster. Valid values are v1 or v2. See the Rancher documentation for more details on the differences between the V1 and V2 Rancher monitoring stacks. This parameter is used to configure an appropriate default value for the federation_target parameter, if that parameter isn’t overwritten.

jsonnetfile_parameters

type

dict

default
kube_prometheus_version: ${rancher_monitoring:kube_prometheus_version:1.18}

Map of string values to use as external Jsonnet variables when rendering jsonnetfile.json from the jsonnetfile.jsonnet in the repository.

The intent is that kube_prometheus_version is configured to a kube-prometheus version (Git tree-ish) which is compatible with the target cluster’s Kubernetes version.

prometheusInstance

type

string

default

platform

The Prometheus instance name to use. For some uses this will be prefixed with prometheus, for example for the StatefulSet name.

prometheusNamespaceSelector

type

string

default

null

This parameters allows to set a custom namespace selector for fields podMonitorNamespaceSelector, ruleNamespaceSelector and serviceMonitorNamespaceSelector. If this parameter isn’t set it will default to:

---
matchLabels: {
  SYNMonitoring: 'main',
},
---

prometheus

type

dict

default
storage:
  volumeClaimTemplate:
    spec:
      storageClassName: fast
      resources:
        requests:
          storage: 10Gi

Prometheus customizations. See the PrometheusSpec documentation for all possible configurations. The value of this parameter is merged into the spec field of the Prometheus object managed by the component.

alertmanagerInstance

type

string

default

platform

The Alertmanager instance name to use. For some uses this will be prefixed with alertmanager, for example for the StatefulSet name.

alertmanager

type

dict

default
replicas: 1
logLevel: info

Alertmanager customization. See the AlertmanagerSpec documentation for all possible configurations. The value of this parameter is merged into the spec field of the Alertmanager object managed by the component.

alertmanagerConfig

type

dict

default
receivers:
  # blackhole receiver for alerts we don't care about
  - name: devnull
# If the user doesn't configure a receiver, send everything to
# devnull. This allows Alertmanager to start even if the user doesn't
# configure any alert receivers.
route:
  receiver: devnull

The value of this parameter is deployed verbatim as the Alertmanager configuration in alertmanager.yaml.

See the Alertmanager documentation for possible configuration options.

The default configuration allows Alertmanager to start even if no alert receivers have been configured for the cluster.

federation

type

dict

default
interval: 10s
scrape_timeout: 10s
extra_metric_relabel_configs: []

Configure the scrape interval and timeout for the Prometheus job which federates metrics from the Rancher Prometheus instance in cattle-prometheus.

Users should ensure that the scrape_timeout is lower than the interval, as there’s no validation logic in the component.

extra_metric_relabel_configs allwos appending additional relabel configs to the federation job.

alerts.namespaceSelector

type

string

default

namespace=~"default|((kube|syn|cattle).*)"

Namespace selector which is injected into alert rules by kube-prometheus (via kubernetes-mixin).

By default, alerts for namespaced objects are only configured for namespaces which are part of Kubernetes, Rancher, or Project Syn.

To fully remove the selector, set this parameter to null.

alerts.ignoreNames

type

list

default

[]

A list of alert names which should be completely disabled on the cluster.

Any alerts which match one of the names listed in ignoreNames are dropped from the final set of alert rules.

alerts.customAnnotations

type

dict

default

{}

Maps alert names to sets of custom annotations. Allows configuring custom annotations for individual alerts

Example:

customAnnotations:
  Watchdog:
    runbook_url: https://www.google.com/?q=Watchdog

alerts.sharedStorageClass

type

string

default

``

A regular expression that matches the shared storage classes in this cluster. A shared storage class is a storage class for which PVs share the same underlying volume, which causes them to fill up at the same rate. The component configures the alert rules to ensure that only a single alert is produced for storage classes matching this regex.

Users must ensure that the regex only matches storage classes which share a single backing volume. Otherwise volume utilization alerts will be lost.

Example:

sharedStorageClass: "bulk|foo.*"

thanos

type

dict

default

{}

This parameter allows to configure the object storage for the Prometheus Thanos sidecar containers. If this dict doesn’t have a key type, the Thanos sidecar container won’t be deployed. See the Official documentation for all possible configuration options.

Example:

---
thanos:
  type: S3
  config:
    bucket: my-bucket
    endpoint: my-s3.example.com
    access_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/thanos/access_key}
    secret_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/thanos/secret_key}
---

rules

type

dict

default

{}

This parameter allows users to configure additional Prometheus rules to deploy on the cluster.

Each key-value pair in the dictionary is transformed into a PrometheusRule object by the component.

The component expects that values are dicts themselves and expects that keys in those dicts are prefixed with record: or alert: to indicate whether the rule is a recording or alerting rule. The component will transform the keys into fields in the resulting rule by taking the prefix as the field name and the rest of the key as the field value. For example, key "record:sum:some:metric:5m" would be transformed into record: sum:some:metric:5m which should define a recording rule with name sum:some:metric:5m. This field is then merged into the provided value which should be a valid rule definition.

See the Prometheus docs for supported configurations for recording and alerting rules.

Example:

---
rules:
  generic-rules:
    "alert:ContainerOOMKilled":
      annotations:
        message: A container ({{$labels.container}}) in pod {{ $labels.namespace }}/{{ $labels.pod }} was OOM killed
      expr: |
        kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
      labels:
        source: https://git.vshn.net/swisscompks/syn-tenant-repo/-/blob/master/common.yml
        severity: devnull

Example

parameters:

  rancher_monitoring:
    # Dynamically adjust `kube-prometheus` version (Assumes a fact
    # `eks_version` containing the target cluster version as
    # `<major>.<minor>` exists.
    cluster_kubernetes_version: ${facts:eks_version}

    prometheus:
      replicas: 2
      requests:
        memory: 4Gi
        cpu: '2'
      limits:
        cpu: '4'
      storage:
        volumeClaimTemplate:
          spec:
            storageClassName: gp2
      thanos:
        resources:
          requests:
            memory: 2Gi
            cpu: '1'
          limits:
            memory: 4Gi
            cpu: '2'

    alertmanager:
      replicas: 3

    alertmanagerConfig:
      receivers:
        - name: my-super-receiver
          webhook_configs:
            - send_resolved: true
              http_config:
                bearer_token: thesecretbearertoken
              url: https://alert-receiver.example.com/alertmanager_webhook
      route:
        routes:
          # Disable KubePodCrashLooping and
          # KubeDeploymentReplicasMismatch in
          # all namespaces ending with `-dev`
          - receiver: devnull
            continue: false
            match_re
              alertname: '^(KubeDeploymentReplicasMismatch|KubePodCrashLooping)$'
              namespace: '-dev$'
        # Use receiver configured above as default
        receiver: my-super-receiver
    thanos:
      type: S3
      config:
        bucket: my-bucket
        endpoint: my-s3.example.com
        access_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/thanos/access_key}
        secret_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/thanos/secret_key}