Skip to content

Commit

Permalink
Support ClickHouse server monitoring & Support service hierarchy (apa…
Browse files Browse the repository at this point in the history
  • Loading branch information
CzyerChen authored Mar 7, 2024
1 parent 6d8524f commit 00ccace
Show file tree
Hide file tree
Showing 29 changed files with 4,040 additions and 3 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/skywalking.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -648,6 +648,8 @@ jobs:
config: test/e2e-v2/cases/pulsar/e2e.yaml
- name: RocketMQ
config: test/e2e-v2/cases/rocketmq/e2e.yaml
- name: ClickHouse
config: test/e2e-v2/cases/clickhouse/clickhouse-prometheus-endpoint/e2e.yaml

- name: UI Menu BanyanDB
config: test/e2e-v2/cases/menu/banyandb/e2e.yaml
Expand Down
4 changes: 4 additions & 0 deletions docs/en/changes/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,10 @@
* Add Service Hierarchy auto matching layer relationships (upper -> lower) as following:
- KAFKA -> K8S_SERVICE
- VIRTUAL_MQ -> KAFKA
* Support ClickHouse server monitoring.
* Add Service Hierarchy auto matching layer relationships (upper -> lower) as following:
- CLICKHOUSE -> K8S_SERVICE
- VIRTUAL_DATABASE -> CLICKHOUSE

#### UI

Expand Down
19 changes: 19 additions & 0 deletions docs/en/concepts-and-designs/service-hierarchy.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ If you want to customize it according to your own needs, please refer to [Servic
| VIRTUAL_MQ | RABBITMQ | [VIRTUAL_MQ On RABBITMQ](#virtual_mq-on-rabbitmq) |
| VIRTUAL_MQ | ROCKETMQ | [VIRTUAL_MQ On K8S_SERVICE](#virtual_mq-on-rocketmq) |
| VIRTUAL_MQ | KAFKA | [VIRTUAL_MQ On KAFKA](#virtual_mq-on-kafka) |
| VIRTUAL_MQ | RABBITMQ | [VIRTUAL_MQ On RABBITMQ](#virtual_mq-on-rabbitmq) |
| CLICKHOUSE | K8S_SERVICE | [CLICKHOUSE On K8S_SERVICE](#clickhouse-on-k8s_service) |
| VIRTUAL_DATABASE | CLICKHOUSE | [VIRTUAL_DATABASE On CLICKHOUSE](#virtual_database-on-clickhouse) |

- The following sections will describe the **default matching rules** in detail and use the `upper-layer On lower-layer` format.
- The example service name are based on SkyWalking [Showcase](https://github.com/apache/skywalking-showcase) default deployment.
Expand Down Expand Up @@ -180,6 +183,22 @@ If you want to customize it according to your own needs, please refer to [Servic
- VIRTUAL_MQ.service.name: `kafka.skywalking-showcase.svc.cluster.local:9092`
- KAFKA.service.name: `kafka::rocketmq.skywalking-showcase`

#### CLICKHOUSE On K8S_SERVICE
- Rule name: `short-name`
- Groovy script: `{ (u, l) -> u.shortName == l.shortName }`
- Description: CLICKHOUSE.service.shortName == K8S_SERVICE.service.shortName
- Matched Example:
- CLICKHOUSE.service.name: `clickhouse::clickhouse.skywalking-showcase`
- K8S_SERVICE.service.name: `skywalking-showcase::clickhouse.skywalking-showcase`

#### VIRTUAL_DATABASE On CLICKHOUSE
- Rule name: `lower-short-name-with-fqdn`
- Groovy script: `{ (u, l) -> u.shortName.substring(0, u.shortName.lastIndexOf(':')) == l.shortName.concat('.svc.cluster.local') }`
- Description: VIRTUAL_DATABASE.service.shortName remove port == CLICKHOUSE.service.shortName with fqdn suffix
- Matched Example:
- VIRTUAL_DATABASE.service.name: `clickhouse.skywalking-showcase.svc.cluster.local:8123`
- CLICKHOUSE.service.name: `clickhouse::clickhouse.skywalking-showcase`

### Build Through Specific Agents
Use agent tech involved(such as eBPF) and deployment tools(such as operator and agent injector) detect the service hierarchy relations.

Expand Down
140 changes: 140 additions & 0 deletions docs/en/setup/backend/backend-clickhouse-monitoring.md

Large diffs are not rendered by default.

134 changes: 134 additions & 0 deletions docs/en/swip/SWIP-5.md

Large diffs are not rendered by default.

4 changes: 3 additions & 1 deletion docs/en/swip/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,11 @@ All accepted and proposed SWIPs could be found in [here](https://github.com/apac

## Known SWIPs

Next SWIP Number: 4
Next SWIP Number: 5

### Accepted SWIPs

- [SWIP-5 Support ClickHouse Monitoring](SWIP-5.md)
- [SWIP-4 Support available layers of service in the topology](SWIP-4.md)
- [SWIP-3 Support RocketMQ Monitoring](SWIP-3.md)
- [SWIP-2 Collecting and Gathering Kubernetes Monitoring Data](SWIP-2.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,12 @@ public enum Layer {
/**
* A cloud native messaging and streaming platform, making it simple to build event-driven applications.
*/
ROCKETMQ(35, true);
ROCKETMQ(35, true),

/**
* A high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP).
*/
CLICKHOUSE(36, true);

private final int value;
/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ public class UITemplateInitializer {
Layer.BOOKKEEPER.name(),
Layer.NGINX.name(),
Layer.ROCKETMQ.name(),
Layer.CLICKHOUSE.name(),
"custom"
};
private final UITemplateManagementService uiTemplateManagementService;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -353,7 +353,7 @@ receiver-otel:
selector: ${SW_OTEL_RECEIVER:default}
default:
enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"otlp-metrics,otlp-logs"}
enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,nginx/*,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*,bookkeeper/*,rocketmq/*"}
enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,nginx/*,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*,bookkeeper/*,rocketmq/*,clickhouse/*"}

receiver-zipkin:
selector: ${SW_RECEIVER_ZIPKIN:-}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,13 @@ hierarchy:
KAFKA:
K8S_SERVICE: short-name

CLICKHOUSE:
K8S_SERVICE: short-name

VIRTUAL_DATABASE:
MYSQL: lower-short-name-with-fqdn
POSTGRESQL: lower-short-name-with-fqdn
CLICKHOUSE: lower-short-name-with-fqdn

VIRTUAL_MQ:
ROCKETMQ: lower-short-name-with-fqdn
Expand Down Expand Up @@ -91,6 +95,7 @@ layer-levels:
APISIX: 2
NGINX: 2
ROCKETMQ: 2
CLICKHOUSE: 2
RABBITMQ: 2
KAFKA: 2

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# This will parse a textual representation of a duration. The formats
# accepted are based on the ISO-8601 duration format {@code PnDTnHnMn.nS}
# with days considered to be exactly 24 hours.
# <p>
# Examples:
# <pre>
# "PT20.345S" -- parses as "20.345 seconds"
# "PT15M" -- parses as "15 minutes" (where a minute is 60 seconds)
# "PT10H" -- parses as "10 hours" (where an hour is 3600 seconds)
# "P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds)
# "P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes"
# "P-6H3M" -- parses as "-6 hours and +3 minutes"
# "-P6H3M" -- parses as "-6 hours and -3 minutes"
# "-P-6H+3M" -- parses as "+6 hours and -3 minutes"
# </pre>
filter: "{ tags -> tags.job_name == 'clickhouse-monitoring' }" # The OpenTelemetry job name
expSuffix: tag({tags -> tags.host_name = 'clickhouse::' + tags.host_name}).instance(['host_name'], ['service_instance_id'], Layer.CLICKHOUSE)
metricPrefix: meter_clickhouse
metricsRules:
# Version of the server in a single integer number in base-1000.
- name: instance_version
exp: ClickHouseMetrics_VersionInteger
# CPU time spent seen by OS.
- name: instance_cpu_usage
exp: ClickHouseProfileEvents_OSCPUVirtualTimeMicroseconds.increase('PT1M')/60
# The percentage of memory (bytes) allocated by the server.
- name: instance_memory_usage
exp: ClickHouseMetrics_MemoryTracking / ClickHouseAsyncMetrics_OSMemoryTotal * 100
# The percentage of memory available to be used by programs.
- name: instance_memory_available
exp: ClickHouseAsyncMetrics_OSMemoryAvailable / ClickHouseAsyncMetrics_OSMemoryTotal * 100
# The server uptime in seconds. It includes the time spent for server initialization before accepting connections.
- name: instance_uptime
exp: ClickHouseAsyncMetrics_Uptime
# Number of files opened per minute.
- name: instance_file_open
exp: ClickHouseProfileEvents_FileOpen.increase('PT1M')
# Network
# Number of connections to TCP server.
- name: instance_tcp_connections
exp: ClickHouseMetrics_TCPConnection
# Number of client connections using MySQL protocol.
- name: instance_mysql_connections
exp: ClickHouseMetrics_MySQLConnection
# Number of connections to HTTP server.
- name: instance_http_connections
exp: ClickHouseMetrics_HTTPConnection
# Number of connections from other replicas to fetch parts.
- name: instance_interserver_connections
exp: ClickHouseMetrics_InterserverConnection
# Number of client connections using PostgreSQL protocol.
- name: instance_postgresql_connections
exp: ClickHouseMetrics_PostgreSQLConnection
# Total number of bytes received from network.
- name: instance_network_receive_bytes
exp: ClickHouseProfileEvents_NetworkReceiveBytes.increase('PT1M')
# Total number of bytes send to network.
- name: instance_network_send_bytes
exp: ClickHouseProfileEvents_NetworkSendBytes.increase('PT1M')
# Query
# Number of executing queries
- name: instance_query
exp: ClickHouseProfileEvents_Query.increase('PT1M')
# Number of executing queries, but only for SELECT queries.
- name: instance_query_select
exp: ClickHouseProfileEvents_SelectQuery.increase('PT1M')
# Number of executing queries, but only for INSERT queries.
- name: instance_query_insert
exp: ClickHouseProfileEvents_InsertQuery.increase('PT1M')
# Number of SELECT queries per second.
- name: instance_query_select_rate
exp: ClickHouseProfileEvents_SelectQuery.rate('PT1M')
# Number of INSERT queries per second.
- name: instance_query_insert_rate
exp: ClickHouseProfileEvents_InsertQuery.rate('PT1M')
# Total time of all queries
- name: instance_querytime_microseconds
exp: ClickHouseProfileEvents_QueryTimeMicroseconds.increase('PT1M')
# Total time of SELECT queries.
- name: instance_querytime_select_microseconds
exp: ClickHouseProfileEvents_SelectQueryTimeMicroseconds.increase('PT1M')
# Total time of INSERT queries.
- name: instance_querytime_insert_microseconds
exp: ClickHouseProfileEvents_InsertQueryTimeMicroseconds.increase('PT1M')
# Total time of queries that are not SELECT or INSERT.
- name: instance_querytime_other_microseconds
exp: ClickHouseProfileEvents_OtherQueryTimeMicroseconds.increase('PT1M')
# Number of reads from a file that were slow.
- name: instance_query_slow
exp: ClickHouseProfileEvents_SlowRead.rate('PT1M')
# Insertion
# Number of rows INSERTed to all tables.
- name: instance_inserted_rows
exp: ClickHouseProfileEvents_InsertedRows.rate('PT1M')
# Number of bytes INSERTed to all tables.
- name: instance_inserted_bytes
exp: ClickHouseProfileEvents_InsertedBytes.rate('PT1M')
# Number of times the INSERT of a block to a MergeTree table was throttled due to high number of active data parts for partition.
- name: instance_delayed_inserts
exp: ClickHouseProfileEvents_DelayedInserts.rate('PT1M')
# Replicas
# Number of data parts checking for consistency.
- name: instance_replicated_checks
exp: ClickHouseMetrics_ReplicatedChecks
# Number of data parts being fetched from replica.
- name: instance_replicated_fetch
exp: ClickHouseMetrics_ReplicatedFetch
# Number of data parts being sent to replicas.
- name: instance_replicated_send
exp: ClickHouseMetrics_ReplicatedSend
# MergeTree
# Number of executing background merges.
- name: instance_background_merge
exp: ClickHouseMetrics_Merge
# Rows read for background merges. This is the number of rows before merge.
- name: instance_merge_rows
exp: ClickHouseProfileEvents_MergedRows.increase('PT1M')
# Uncompressed bytes (for columns as they stored in memory) that was read for background merges. This is the number before merge.
- name : instance_merge_uncompressed_bytes
exp: ClickHouseProfileEvents_MergedUncompressedBytes.increase('PT1M')
# Number of currently executing moves.
- name: instance_move
exp: ClickHouseMetrics_Move
# Active data part, used by current and upcoming SELECTs.
- name: instance_parts_active
exp: ClickHouseMetrics_PartsActive
# Number of mutations (ALTER DELETE/UPDATE).
- name: instance_mutations
exp: ClickHouseMetrics_PartMutation
# Kafka Table Engine
# Number of Kafka messages already processed by ClickHouse.
- name: instance_kafka_messages_read
exp: ClickHouseProfileEvents_KafkaMessagesRead.rate('PT1M')
# Number of writes (inserts) to Kafka tables.
- name: instance_kafka_writes
exp: ClickHouseProfileEvents_KafkaWrites.rate('PT1M')
# Number of active Kafka consumers.
- name: instance_kafka_consumers
exp: ClickHouseMetrics_KafkaConsumers
# Number of active Kafka producer created.
- name: instance_kafka_producers
exp: ClickHouseMetrics_KafkaProducers
# Zookeeper
# Number of sessions (connections) to ZooKeeper. Should be no more than one, because using more than one connection to ZooKeeper may lead to bugs due to lack of linearizability (stale reads) that ZooKeeper consistency model allows.
- name: instance_zookeeper_session
exp: ClickHouseMetrics_ZooKeeperSession
# Number of watches (event subscriptions) in ZooKeeper.
- name: instance_zookeeper_watch
exp: ClickHouseMetrics_ZooKeeperWatch
# Number of bytes send over network while communicating with ZooKeeper.
- name: instance_zookeeper_bytes_sent
exp: ClickHouseProfileEvents_ZooKeeperBytesSent.rate('PT1M')
# Number of bytes received over network while communicating with ZooKeeper.
- name: instance_zookeeper_bytes_received
exp: ClickHouseProfileEvents_ZooKeeperBytesReceived.rate('PT1M')
# ClickHouse Keeper
# Number of alive connections for embedded ClickHouse Keeper.
- name: instance_keeper_connections_alive
exp: ClickHouseMetrics_KeeperAliveConnections
# Number of outstanding requests for embedded ClickHouse Keeper.
- name: instance_keeper_outstanding_requests
exp: ClickHouseMetrics_KeeperOutstandingRequets

Loading

0 comments on commit 00ccace

Please sign in to comment.