Parameters
The parent key for all of the following parameters is rancher_monitoring
.
namespace
type |
string |
default |
|
The namespace in which to deploy the Syn Rancher monitoring stack
kube_prometheus_version
type |
dict |
default |
|
Map of kube-prometheus versions which are compatible with different Kubernetes versions.
See the kube-prometheus Kubernetes compatibility matrix for updating this map.
cluster_kubernetes_version
type |
string |
default |
|
The target cluster’s Kubernetes version.
Used to look up the kube-prometheus
version in the kube_prometheus_version
map.
Having this parameter as a layer of indirection allows setting the target cluster’s Kubernetes version without needing parameters.rancher_monitoring.kube_prometheus_version
to exist.
We’re currently using this approach as a workaround for the fact that Commodore doesn’t support dynamic facts yet.
Once dynamic facts are implemented, all clusters will have a uniformly named fact which represents the cluster’s Kubernetes version.
That fact can then be used in place of this parameter to select the |
federation_target
type |
string |
default |
|
The service name of the Prometheus instance with which to federate. Usually this is the Prometheus instance managed by Rancher.
By default, this parameter is set based on the value of parameter rancher_monitoring_version
.
The default configuration ensures that this parameter is set to the default service name for the Rancher-managed Prometheus instance for Rancher monitoring V1 or V2 depending on the value of rancher_monitoring_version
.
rancher_monitoring_version
type |
string |
default |
|
The version of the Rancher monitoring stack which is enabled on the cluster.
Valid values are v1
or v2
.
See the Rancher documentation for more details on the differences between the V1 and V2 Rancher monitoring stacks.
This parameter is used to configure an appropriate default value for the federation_target
parameter, if that parameter isn’t overwritten.
jsonnetfile_parameters
type |
dict |
default |
|
Map of string values to use as external Jsonnet variables when rendering jsonnetfile.json
from the jsonnetfile.jsonnet
in the repository.
The intent is that kube_prometheus_version
is configured to a kube-prometheus
version (Git tree-ish) which is compatible with the target cluster’s Kubernetes version.
prometheusInstance
type |
string |
default |
|
The Prometheus instance name to use.
For some uses this will be prefixed with prometheus
, for example for the StatefulSet name.
prometheusNamespaceSelector
- type
-
string
- default
-
null
This parameters allows to set a custom namespace selector for fields podMonitorNamespaceSelector
, ruleNamespaceSelector
and serviceMonitorNamespaceSelector
.
If this parameter isn’t set it will default to:
---
matchLabels: {
SYNMonitoring: 'main',
},
---
prometheus
type |
dict |
default |
|
Prometheus customizations.
See the PrometheusSpec
documentation for all possible configurations.
The value of this parameter is merged into the spec
field of the Prometheus
object managed by the component.
alertmanagerInstance
type |
string |
default |
|
The Alertmanager instance name to use.
For some uses this will be prefixed with alertmanager
, for example for the StatefulSet name.
alertmanager
type |
dict |
default |
|
Alertmanager customization.
See the AlertmanagerSpec
documentation for all possible configurations.
The value of this parameter is merged into the spec
field of the Alertmanager
object managed by the component.
alertmanagerConfig
type |
dict |
default |
|
The value of this parameter is deployed verbatim as the Alertmanager configuration in alertmanager.yaml
.
See the Alertmanager documentation for possible configuration options.
The default configuration allows Alertmanager to start even if no alert receivers have been configured for the cluster.
federation
type |
dict |
default |
|
Configure the scrape interval and timeout for the Prometheus job which federates metrics from the Rancher Prometheus instance in cattle-prometheus
.
Users should ensure that the scrape_timeout
is lower than the interval
, as there’s no validation logic in the component.
extra_metric_relabel_configs
allwos appending additional relabel configs to the federation job.
alerts.namespaceSelector
type |
string |
default |
|
Namespace selector which is injected into alert rules by kube-prometheus
(via kubernetes-mixin
).
By default, alerts for namespaced objects are only configured for namespaces which are part of Kubernetes, Rancher, or Project Syn.
To fully remove the selector, set this parameter to null
.
alerts.ignoreNames
type |
list |
default |
|
A list of alert names which should be completely disabled on the cluster.
Any alerts which match one of the names listed in ignoreNames
are dropped from the final set of alert rules.
alerts.customAnnotations
type |
dict |
default |
|
Maps alert names to sets of custom annotations. Allows configuring custom annotations for individual alerts
Example:
customAnnotations:
Watchdog:
runbook_url: https://www.google.com/?q=Watchdog
alerts.sharedStorageClass
type |
string |
default |
`` |
A regular expression that matches the shared storage classes in this cluster. A shared storage class is a storage class for which PVs share the same underlying volume, which causes them to fill up at the same rate. The component configures the alert rules to ensure that only a single alert is produced for storage classes matching this regex.
Users must ensure that the regex only matches storage classes which share a single backing volume. Otherwise volume utilization alerts will be lost.
Example:
sharedStorageClass: "bulk|foo.*"
thanos
type |
dict |
default |
|
This parameter allows to configure the object storage for the Prometheus Thanos sidecar containers.
If this dict doesn’t have a key type
, the Thanos sidecar container won’t be deployed.
See the Official documentation for all possible configuration options.
Example:
---
thanos:
type: S3
config:
bucket: my-bucket
endpoint: my-s3.example.com
access_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/thanos/access_key}
secret_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/thanos/secret_key}
---
rules
type |
dict |
default |
|
This parameter allows users to configure additional Prometheus rules to deploy on the cluster.
Each key-value pair in the dictionary is transformed into a PrometheusRule
object by the component.
The component expects that values are dicts themselves and expects that keys in those dicts are prefixed with record:
or alert:
to indicate whether the rule is a recording or alerting rule.
The component will transform the keys into fields in the resulting rule by taking the prefix as the field name and the rest of the key as the field value.
For example, key "record:sum:some:metric:5m"
would be transformed into record: sum:some:metric:5m
which should define a recording rule with name sum:some:metric:5m
.
This field is then merged into the provided value which should be a valid rule definition.
Example:
---
rules:
generic-rules:
"alert:ContainerOOMKilled":
annotations:
message: A container ({{$labels.container}}) in pod {{ $labels.namespace }}/{{ $labels.pod }} was OOM killed
expr: |
kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
labels:
source: https://git.vshn.net/swisscompks/syn-tenant-repo/-/blob/master/common.yml
severity: devnull
Example
parameters:
rancher_monitoring:
# Dynamically adjust `kube-prometheus` version (Assumes a fact
# `eks_version` containing the target cluster version as
# `<major>.<minor>` exists.
cluster_kubernetes_version: ${facts:eks_version}
prometheus:
replicas: 2
requests:
memory: 4Gi
cpu: '2'
limits:
cpu: '4'
storage:
volumeClaimTemplate:
spec:
storageClassName: gp2
thanos:
resources:
requests:
memory: 2Gi
cpu: '1'
limits:
memory: 4Gi
cpu: '2'
alertmanager:
replicas: 3
alertmanagerConfig:
receivers:
- name: my-super-receiver
webhook_configs:
- send_resolved: true
http_config:
bearer_token: thesecretbearertoken
url: https://alert-receiver.example.com/alertmanager_webhook
route:
routes:
# Disable KubePodCrashLooping and
# KubeDeploymentReplicasMismatch in
# all namespaces ending with `-dev`
- receiver: devnull
continue: false
match_re
alertname: '^(KubeDeploymentReplicasMismatch|KubePodCrashLooping)$'
namespace: '-dev$'
# Use receiver configured above as default
receiver: my-super-receiver
thanos:
type: S3
config:
bucket: my-bucket
endpoint: my-s3.example.com
access_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/thanos/access_key}
secret_key: ?{vaultkv:${cluster:tenant}/${cluster:name}/thanos/secret_key}