Monitor other Commodore Components
The Component Prometheus provides the option to deploy Prometheus and monitor core Kubernetes services.
However it also supports picking up metrics and alerts from other workloads deployed by other components through the custer-monitoring
addon and some convenient jsonnet functions.
Enabling cluster-monitoring
To enable monitoring other components we simply need to add the cluster-monitoring
addon.
The addon will deploy a ClusterRole
and a ClusterRoleBinding
which allows the Prometheus instances managed by the component to scrape metrics from all namespaces in the cluster.
Additionally, the ClusterRole
provides RBAC rules for commonly used resources for SubjectAccessReview
queries for metrics endpoints which are secured with oauth-proxy
.
This sets namespace selectors on every Prometheus instance that will result in them picking up all ServiceMonitors
, PodMonitors
, and Probes
in namespaces with the label monitoring.syn.tools/<instance>
.
PrometheusRules
, ServiveMonitors
, PodMonitors
and Probes
are only picked up if they’re in a labeled namespace AND the rule is labeled with monitoring.syn.tools/enabled: "true"
.
This should ensure that rules are added consciously and prevent a massive import of upstream alerts that aren’t actionable or don’t meet our standards.
It’s possible to pick up the rules from any namespace by setting the parameter addon_configs.cluster_monitoring.restrict_selectors
to false
.
You also have the option to disable ServiceMonitors
, PodMonitors
, or Probes
by labeling them with monitoring.syn.tools/enabled: "false"
.
The example below will make the Prometheus instance default-instance
pick up all namespaces with a label monitoring.syn.tools/default-instance: "true"
.
parameters:
prometheus:
kubernetes_version: '1.22'
defaultInstance: 'default-instance' (1)
instances:
default-instance:
common:
namespace: default-instance
prometheus:
enabled: true
nodeExporter:
enabled: true
addons:
- cluster-monitoring (2)
namespaces:
default-instance: {}
1 | The defaultInstance will be used by all components that don’t explicitly request a specific instance |
2 | Enabling the cluster-monitoring addon |
Advertise Metrics from a Component
When writing a component we can advertise metrics and rules to Prometheus by creating ServiceMonitors
or PrometheusRules
and correctly labeling the namespace of the component.
To do this, the component-prometheus provides helper functions as the library lib/prometheus.libsonnet
.
The component namespace can easily be labeled by using the RegisterNamespace
function.
This function takes a namespace and returns the provided namespace with additional necessary labels for Prometheus to pick it up.
The function NetworkPolicy
returns a network policy that allows ingress traffic from the Prometheus namespace.
This means when writing a component you don’t need to know where Prometheus is deployed.
We also provide helper functions to create ServiceMonitors
, PodMonitors
, Probes
, and PrometheusRules
.
The PrometheusRule
helper function already ensures that the necessary enabled
label is set.
If you need to enable an existing PrometheusRule
you can use the Enable()
helper functions to set the label.
local prometheus = import 'lib/prometheus.libsonnet';
...
local namespace = kube.Namespace(params.namespace)
{
'00_namespace': prometheus.RegisterNamespace(namespace),(1)
'01_networkpolicy': prometheus.NetworkPolicy(){ (2)
metadata+: {
namespace: params.namespace,
},
},
'10_servicemonitor': prometheus.ServiceMonitor('foo'){ (3)
...
},
'10_alert': prometheus.Enable(upstreamAlert), (4)
}
1 | Add a label so the default instance will pick up the namespace |
2 | Depending on the cluster distribution you will need to add a NetworkPolicy.
Without it Prometheus won’t be able to scape the targets.
The NetworkPolicy functions will provide a correctly configured NetworkPolicy to allow ingress traffic from the Prometheus instance. |
3 | Create a ServiceMonitor called 'foo' that’s guaranteed to be picked up by Prometheus. |
4 | Assuming there is an existing upstreamAlert rule you can enable it using the Enable helper function. |
Don’t create a NetworkPolicy for permissive clusters without default NetworkPolicies. Doing so will drop any traffic not originating from Prometheus. |