Skip to content

Commit

Permalink
Using PodMonitor for scraping Kafka related metrics (strimzi#3351)
Browse files Browse the repository at this point in the history
* Removed some useless additional scrape configurations
Fixed usage of ServiceMonitor for scraping Kafka metrics
Fixed Grafana dashboards due to above changes

Signed-off-by: Paolo Patierno <[email protected]>

* Moved to use PodMonitor instead of ServiceMonitor for Kafka, ZooKeeper
related metrics
Reverted back changes on Grafana dashboards
Updated documentation about usage of PodMonitor

Signed-off-by: Paolo Patierno <[email protected]>

* Fix comment

Signed-off-by: Paolo Patierno <[email protected]>

* Fixed Prometeus rules
Fixed PodMonitor wrong matching default namespace

Signed-off-by: Paolo Patierno <[email protected]>
  • Loading branch information
ppatierno authored Jul 22, 2020
1 parent 3152c32 commit 234efbc
Show file tree
Hide file tree
Showing 10 changed files with 63 additions and 289 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,13 @@ We removed the old ones from the Prometheus scraping configuration/alerts and on
It means that the charts related to memory and CPU usage are not going to work on Kuvbernetes version previous 1.14.
For more information on what is changed: https://github.com/strimzi/strimzi-kafka-operator/pull/3312

#### Deprecation of monitoring port on Kafka and ZooKeeper related services

The `PodMonitor` resource is now used instead of the `ServiceMonitor` for scraping metrics from Kafka, ZooKeeper, Kafka Connect and so on.
For this reason, we are deprecating the monitoring port `tcp-prometheus` (9404) on all the services where it is declared (Kafka bootstrap, ZooKeeper client and so on).
This port will be removed in the next release.
Together with it we will also remove the Prometheus annotation from the service.

## 0.18.0

* Add possibility to set Java System Properties for User Operator and Topic Operator via `Kafka` CR.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ Additional Prometheus-related configuration is also provided in the following fi
* `prometheus-additional.yaml`
* `prometheus-rules.yaml`
* `strimzi-pod-monitor.yaml`
* `strimzi-service-monitor.yaml`

For Prometheus to obtain monitoring data:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ When you apply the Prometheus configuration, the following resources are created
* A `ServiceAccount` for the Prometheus pods to run under.
* A `ClusterRoleBinding` which binds the `ClusterRole` to the `ServiceAccount`.
* A `Deployment` to manage the Prometheus Operator pod.
* A `ServiceMonitor` to manage the configuration of the Prometheus pod.
* A `PodMonitor` to manage the configuration of the Prometheus pod.
* A `Prometheus` to manage the configuration of the Prometheus pod.
* A `PrometheusRule` to manage alerting rules for the Prometheus pod.
* A `Secret` to manage additional Prometheus settings.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,8 @@ On MacOS, use:
[source,shell,subs="+quotes,attributes"]
sed -i '' 's/namespace: .*/namespace: _my-namespace_/' prometheus.yaml

. Edit the `ServiceMonitor` resource in `strimzi-service-monitor.yaml` to define Prometheus jobs that will scrape the metrics data from services.
`ServiceMonitor` is used to scrape metrics through services and is used for Apache Kafka, ZooKeeper.

. Edit the `PodMonitor` resource in `strimzi-pod-monitor.yaml` to define Prometheus jobs that will scrape the metrics data from pods.
`PodMonitor` is used to scrape data directly from pods and is used for Operators.
`PodMonitor` is used to scrape data directly from pods and is used for Apache Kafka, ZooKeeper, Operators, and Kafka Bridge.

. To use another role:

Expand All @@ -49,7 +46,6 @@ kubectl create secret generic additional-scrape-configs --from-file=prometheus-a
. Deploy the Prometheus resources:
+
[source,shell,subs="+quotes,attributes"]
kubectl apply -f strimzi-service-monitor.yaml
kubectl apply -f strimzi-pod-monitor.yaml
kubectl apply -f prometheus-rules.yaml
kubectl apply -f prometheus.yaml
2 changes: 0 additions & 2 deletions documentation/modules/metrics/ref_metrics-config-files.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ metrics
├── prometheus-rules.yaml <9>
├── prometheus.yaml <10>
├── strimzi-pod-monitor.yaml <11>
└── strimzi-service-monitor.yaml <12>
--
<1> Installation file for the Grafana image
<2> Grafana dashboards
Expand All @@ -46,4 +45,3 @@ metrics
<9> Alerting rules examples for use with Prometheus Alertmanager (deployed with Prometheus)
<10> Installation file for the Prometheus image
<11> Prometheus job definitions to scrape metrics data from pods
<12> Prometheus job definitions to scrape metrics data from services
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# To update additional settings create a Secret custom resource by using a command below
# kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml
- job_name: "kubernetes-cadvisor"
- job_name: kubernetes-cadvisor
honor_labels: true
scrape_interval: 10s
scrape_timeout: 10s
Expand Down Expand Up @@ -75,67 +75,6 @@
replacement: $1
action: drop

- job_name: kubernetes-pods
honor_labels: true
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: pod
namespaces:
names: []
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
separator: ;
regex: "true"
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_name]
separator: ;
regex: ^(.+;.*)|(;.*metrics)$
replacement: $1
action: keep
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
separator: ;
regex: (.+):(?:\d+);(\d+)
target_label: __address__
replacement: ${1}:${2}
action: replace
- separator: ;
regex: __meta_kubernetes_pod_label_(.+)
replacement: $1
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: kubernetes_pod_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_node_name]
separator: ;
regex: (.*)
target_label: node_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_host_ip]
separator: ;
regex: (.*)
target_label: node_ip
replacement: $1
action: replace
- job_name: kubernetes-nodes-kubelet
scrape_interval: 10s
scrape_timeout: 10s
Expand All @@ -157,93 +96,3 @@
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics

- job_name: kubernetes-services
honor_labels: true
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- api_server: null
role: endpoints
namespaces:
names: []
relabel_configs:
- source_labels: [__meta_kubernetes_endpoints_name]
separator: ;
regex: prometheus-node-exporter
replacement: $1
action: drop
- source_labels: [__meta_kubernetes_endpoints_name]
separator: ;
regex: prometheus-kube-state-metrics
replacement: $1
action: drop
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
separator: ;
regex: "true"
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
separator: ;
regex: (https?)
target_label: __scheme__
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: $1
action: replace
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
separator: ;
regex: (.+)(?::\d+);(\d+)
target_label: __address__
replacement: $1:$2
action: replace
- separator: ;
regex: __meta_kubernetes_service_label_(.+)
replacement: $1
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: kubernetes_namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: kubernetes_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_node_name]
separator: ;
regex: (.*)
target_label: node_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_host_ip]
separator: ;
regex: (.*)
target_label: node_ip
replacement: $1
action: replace
- separator: ;
regex: __meta_kubernetes_pod_label_(.+)
replacement: $1
action: labelmap
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: kubernetes_pod_name
replacement: $1
action: replace
6 changes: 3 additions & 3 deletions examples/metrics/prometheus-install/prometheus-rules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ spec:
- name: kafka
rules:
- alert: KafkaRunningOutOfSpace
expr: kubelet_volume_stats_available_bytes{kubernetes_pod_name=~"([a-z]+-)+kafka-[0-9]+"} < 5368709120
expr: kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"data-([0-9]+)?-(.+)-kafka-[0-9]+"} < 5368709120
for: 10s
labels:
severity: warning
Expand Down Expand Up @@ -58,7 +58,7 @@ spec:
summary: 'Kafka offline log directories'
description: 'There are {{ $value }} offline log directories on {{ $labels.kubernetes_pod_name }}'
- alert: ScrapeProblem
expr: up{job="kubernetes-services",kubernetes_namespace!~"openshift-.+",kubernetes_pod_name=~".+-kafka-[0-9]+"} == 0
expr: up{kubernetes_namespace!~"openshift-.+",kubernetes_pod_name=~".+-kafka-[0-9]+"} == 0
for: 3m
labels:
severity: major
Expand Down Expand Up @@ -116,7 +116,7 @@ spec:
summary: 'Zookeeper outstanding requests'
description: 'There are {{ $value }} outstanding requests on {{ $labels.kubernetes_pod_name }}'
- alert: ZookeeperRunningOutOfSpace
expr: kubelet_volume_stats_available_bytes{kubernetes_pod_name=~"([a-z]+-)+zookeeper-[0-9]+"} < 5368709120
expr: kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"data-(.+)-zookeeper-[0-9]+"} < 5368709120
for: 10s
labels:
severity: warning
Expand Down
51 changes: 50 additions & 1 deletion examples/metrics/prometheus-install/strimzi-pod-monitor.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,53 @@ spec:
- myproject
podMetricsEndpoints:
- path: /metrics
port: rest-api
port: rest-api
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: kafka-metrics
labels:
app: strimzi
spec:
selector:
matchExpressions:
- key: "strimzi.io/kind"
operator: In
values: ["Kafka", "KafkaConnect"]
namespaceSelector:
matchNames:
- myproject
podMetricsEndpoints:
- path: /metrics
port: tcp-prometheus
relabelings:
- separator: ;
regex: __meta_kubernetes_pod_label_(.+)
replacement: $1
action: labelmap
- sourceLabels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
targetLabel: namespace
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
targetLabel: kubernetes_pod_name
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_node_name]
separator: ;
regex: (.*)
targetLabel: node_name
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_host_ip]
separator: ;
regex: (.*)
targetLabel: node_ip
replacement: $1
action: replace

Loading

0 comments on commit 234efbc

Please sign in to comment.