diff --git a/site2/docs/reference-configuration.md b/site2/docs/reference-configuration.md index a733eacc04ffc..3f31893284d45 100644 --- a/site2/docs/reference-configuration.md +++ b/site2/docs/reference-configuration.md @@ -122,11 +122,13 @@ Pulsar brokers are responsible for handling incoming messages from producers, di |brokerDeduplicationMaxNumberOfProducers| The maximum number of producers for which information will be stored for deduplication purposes. |10000| |brokerDeduplicationEntriesInterval| The number of entries after which a deduplication informational snapshot is taken. A larger interval will lead to fewer snapshots being taken, though this would also lengthen the topic recovery time (the time required for entries published after the snapshot to be replayed). |1000| |brokerDeduplicationProducerInactivityTimeoutMinutes| The time of inactivity (in minutes) after which the broker will discard deduplication information related to a disconnected producer. |360| +|dispatchThrottlingRatePerReplicatorInMsg| The default messages per second dispatch throttling-limit for every replicator in replication. The value of `0` means disabling replication message dispatch-throttling| 0 | +|dispatchThrottlingRatePerReplicatorInByte| The default bytes per second dispatch throttling-limit for every replicator in replication. The value of `0` means disabling replication message-byte dispatch-throttling| 0 | |zooKeeperSessionTimeoutMillis| Zookeeper session timeout in milliseconds |30000| |brokerShutdownTimeoutMs| Time to wait for broker graceful shutdown. After this time elapses, the process will be killed |60000| |backlogQuotaCheckEnabled| Enable backlog quota check. Enforces action on topic when the quota is reached |true| |backlogQuotaCheckIntervalInSeconds| How often to check for topics that have reached the quota |60| -|backlogQuotaDefaultLimitGB| Default per-topic backlog quota limit |10| +|backlogQuotaDefaultLimitGB| The default per-topic backlog quota limit | -1 | |allowAutoTopicCreation| Enable topic auto creation if a new producer or consumer connected |true| |allowAutoTopicCreationType| The topic type (partitioned or non-partitioned) that is allowed to be automatically created. |Partitioned| |defaultNumPartitions| The number of partitioned topics that is allowed to be automatically created if `allowAutoTopicCreationType` is partitioned |1| @@ -225,6 +227,7 @@ Pulsar brokers are responsible for handling incoming messages from producers, di |loadManagerClassName| Name of load manager to use |org.apache.pulsar.broker.loadbalance.impl.SimpleLoadManagerImpl| |managedLedgerOffloadDriver| Driver to use to offload old data to long term storage (Possible values: S3) || |managedLedgerOffloadMaxThreads| Maximum number of thread pool threads for ledger offloading |2| +|managedLedgerUnackedRangesOpenCacheSetEnabled| Use Open Range-Set to cache unacknowledged messages |true| |managedLedgerOffloadDeletionLagMs|Delay between a ledger being successfully offloaded to long term storage and the ledger being deleted from bookkeeper | 14400000| |managedLedgerOffloadAutoTriggerSizeThresholdBytes|The number of bytes before triggering automatic offload to long term storage |-1 (disabled)| |s3ManagedLedgerOffloadRegion| For Amazon S3 ledger offload, AWS region || @@ -361,11 +364,18 @@ The [`pulsar-client`](reference-cli-tools.md#pulsar-client) CLI tool can be used |bookkeeperClientRegionawarePolicyEnabled| |false| |bookkeeperClientReorderReadSequenceEnabled| |false| |bookkeeperClientIsolationGroups||| +|bookkeeperClientSecondaryIsolationGroups| Enable bookie secondary-isolation group if bookkeeperClientIsolationGroups doesn't have enough bookie available. || +|bookkeeperClientMinAvailableBookiesInIsolationGroups| Minimum bookies that should be available as part of bookkeeperClientIsolationGroups else broker will include bookkeeperClientSecondaryIsolationGroups bookies in isolated list. || |managedLedgerDefaultEnsembleSize| |1| |managedLedgerDefaultWriteQuorum| |1| |managedLedgerDefaultAckQuorum| |1| |managedLedgerCacheSizeMB| |1024| +|managedLedgerCacheCopyEntries| Whether we should make a copy of the entry payloads when inserting in cache| false| |managedLedgerCacheEvictionWatermark| |0.9| +|managedLedgerCacheEvictionFrequency| Configure the cache eviction frequency for the managed ledger cache (evictions/sec) | 100.0 | +|managedLedgerCacheEvictionTimeThresholdMillis| All entries that have stayed in cache for more than the configured time, will be evicted | 1000 | +|managedLedgerCursorBackloggedThreshold| Configure the threshold (in number of entries) from where a cursor should be considered 'backlogged' and thus should be set as inactive. | 1000| +|managedLedgerUnackedRangesOpenCacheSetEnabled| Use Open Range-Set to cache unacknowledged messages |true| |managedLedgerDefaultMarkDeleteRateLimit| |0.1| |managedLedgerMaxEntriesPerLedger| |50000| |managedLedgerMinLedgerRolloverTimeMinutes| |10| diff --git a/site2/docs/reference-metrics.md b/site2/docs/reference-metrics.md index 3983dcd275c75..d3e21bd896b7e 100644 --- a/site2/docs/reference-metrics.md +++ b/site2/docs/reference-metrics.md @@ -167,6 +167,7 @@ All the topic metrics are labelled with the following labels: | pulsar_storage_size | Gauge | The total storage size of the topics in this topic owned by this broker (bytes). | | pulsar_storage_backlog_size | Gauge | The total backlog size of the topics of this topic owned by this broker (messages). | | pulsar_storage_offloaded_size | Gauge | The total amount of the data in this topic offloaded to the tiered storage (bytes). | +| pulsar_storage_backlog_quota_limit | Gauge | The total amount of the data in this topic that limit the backlog quota (bytes). | | pulsar_storage_write_rate | Gauge | The total message batches (entries) written to the storage for this topic (message batches / second). | | pulsar_storage_read_rate | Gauge | The total message batches (entries) read from the storage for this topic (message batches / second). | | pulsar_subscription_delayed | Gauge | The total message batches (entries) are delayed for dispatching. | diff --git a/site2/website/versioned_docs/version-2.4.0/reference-configuration.md b/site2/website/versioned_docs/version-2.4.0/reference-configuration.md index 6f8d4be6d84fc..8540700d80fe9 100644 --- a/site2/website/versioned_docs/version-2.4.0/reference-configuration.md +++ b/site2/website/versioned_docs/version-2.4.0/reference-configuration.md @@ -5,6 +5,7 @@ sidebar_label: Pulsar configuration original_id: reference-configuration --- + + +Pulsar exposes metrics in Prometheus format that can be collected and used for monitoring the health of the cluster. + +* [ZooKeeper](#zookeeper) +* [BookKeeper](#bookkeeper) +* [Broker](#broker) + +## Overview + +The metrics exposed by Pulsar are in Prometheus format. The types of metrics are: + +- [Counter](https://prometheus.io/docs/concepts/metric_types/#counter): a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. +- [Gauge](https://prometheus.io/docs/concepts/metric_types/#gauge): a *gauge* is a metric that represents a single numerical value that can arbitrarily go up and down. +- [Histogram](https://prometheus.io/docs/concepts/metric_types/#histogram): a histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. +- [Summary](https://prometheus.io/docs/concepts/metric_types/#summary): similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window. + +## ZooKeeper + +The ZooKeeper metrics are exposed under "/metrics" at port 8000. You can use a different port +by configuring the `stats_server_port` system property. + +### Server metrics + +| Name | Type | Description | +|---|---|---| +| zookeeper_server_znode_count | Gauge | The number of z-nodes stored. | +| zookeeper_server_data_size_bytes | Gauge | The total size of all of z-nodes stored. | +| zookeeper_server_connections | Gauge | The number of currently opened connections. | +| zookeeper_server_watches_count | Gauge | The number of watchers registered. | +| zookeeper_server_ephemerals_count | Gauge | The number of ephemeral z-nodes. | + +### Request metrics + +| Name | Type | Description | +|---|---|---| +| zookeeper_server_requests | Counter | The total number of requests received by a particular server. | +| zookeeper_server_requests_latency_ms | Summary | The requests latency calculated in milliseconds.
Available labels: *type* (write, read).
| + +## BookKeeper + +The BookKeeper metrics are exposed under "/metrics" at port 8000. You can change the port by updating `prometheusStatsHttpPort` +in `bookkeeper.conf` configuration file. + +### Server metrics + +| Name | Type | Description | +|---|---|---| +| bookie_SERVER_STATUS | Gauge | The server status for bookie server.
| +| bookkeeper_server_ADD_ENTRY_count | Counter | The total number of ADD_ENTRY requests received at the bookie. The `success` label is used to distinguish successes and failures. | +| bookkeeper_server_READ_ENTRY_count | Counter | The total number of READ_ENTRY requests received at the bookie. The `success` label is used to distinguish successes and failures. | +| bookie_WRITE_BYTES | Counter | The total number of bytes written to the bookie. | +| bookie_READ_BYTES | Counter | The total number of bytes read from the bookie. | +| bookkeeper_server_ADD_ENTRY_REQUEST | Histogram | The histogram of request latency of ADD_ENTRY requests at the bookie. The `success` label is used to distinguish successes and failures. | +| bookkeeper_server_READ_ENTRY_REQUEST | Histogram | The histogram of request latency of READ_ENTRY requests at the bookie. The `success` label is used to distinguish successes and failures. | + +### Journal metrics + +| Name | Type | Description | +|---|---|---| +| bookie_journal_JOURNAL_SYNC_count | Counter | The total number of journal fsync operations happening at the bookie. The `success` label is used to distinguish successes and failures. | +| bookie_journal_JOURNAL_QUEUE_SIZE | Gauge | The total number of requests pending in the journal queue. | +| bookie_journal_JOURNAL_FORCE_WRITE_QUEUE_SIZE | Gauge | The total number of force write (fsync) requests pending in the force-write queue. | +| bookie_journal_JOURNAL_CB_QUEUE_SIZE | Gauge | The total number of callbacks pending in the callback queue. | +| bookie_journal_JOURNAL_ADD_ENTRY | Histogram | The histogram of request latency of adding entries to the journal. | +| bookie_journal_JOURNAL_SYNC | Histogram | The histogram of fsync latency of syncing data to the journal disk. | + +### Storage metrics + +| Name | Type | Description | +|---|---|---| +| bookie_ledgers_count | Gauge | The total number of ledgers stored in the bookie. | +| bookie_entries_count | Gauge | The total number of entries stored in the bookie. | +| bookie_write_cache_size | Gauge | The bookie write cache size (in bytes). | +| bookie_read_cache_size | Gauge | The bookie read cache size (in bytes). | +| bookie_DELETED_LEDGER_COUNT | Counter | The total number of ledgers deleted since the bookie has started. | +| bookie_ledger_writable_dirs | Gauge | The number of writable directories in the bookie. | + +## Broker + +The broker metrics are exposed under "/metrics" at port 8080. You can change the port by updating `webServicePort` to a different port +in `broker.conf` configuration file. + +All the metrics exposed by a broker are labelled with `cluster=${pulsar_cluster}`. The value of `${pulsar_cluster}` is the pulsar cluster +name you configured in `broker.conf`. + +Broker has the following kinds of metrics: + +* [Namespace metrics](#namespace-metrics) + * [Replication metrics](#replication-metrics) +* [Topic metrics](#topic-metrics) + * [Replication metrics](#replication-metrics-1) +* [Subscription metrics](#subscription-metrics) +* [Consumer metrics](#consumer-metrics) + +### Namespace metrics + +> Namespace metrics are only exposed when `exposeTopicLevelMetricsInPrometheus` is set to `false`. + +All the namespace metrics are labelled with the following labels: + +- *cluster*: `cluster=${pulsar_cluster}`. `${pulsar_cluster}` is the cluster name that you configured in `broker.conf`. +- *namespace*: `namespace=${pulsar_namespace}`. `${pulsar_namespace}` is the namespace name. + +| Name | Type | Description | +|---|---|---| +| pulsar_topics_count | Gauge | The number of Pulsar topics of the namespace owned by this broker. | +| pulsar_subscriptions_count | Gauge | The number of Pulsar subscriptions of the namespace served by this broker. | +| pulsar_producers_count | Gauge | The number of active producers of the namespace connected to this broker. | +| pulsar_consumers_count | Gauge | The number of active consumers of the namespace connected to this broker. | +| pulsar_rate_in | Gauge | The total message rate of the namespace coming into this broker (messages/second). | +| pulsar_rate_out | Gauge | The total message rate of the namespace going out from this broker (messages/second). | +| pulsar_throughput_in | Gauge | The total throughput of the namespace coming into this broker (bytes/second). | +| pulsar_throughput_out | Gauge | The total throughput of the namespace going out from this broker (bytes/second). | +| pulsar_storage_size | Gauge | The total storage size of the topics in this namespace owned by this broker (bytes). | +| pulsar_storage_backlog_size | Gauge | The total backlog size of the topics of this namespace owned by this broker (messages). | +| pulsar_storage_offloaded_size | Gauge | The total amount of the data in this namespace offloaded to the tiered storage (bytes). | +| pulsar_storage_write_rate | Gauge | The total message batches (entries) written to the storage for this namespace (message batches / second). | +| pulsar_storage_read_rate | Gauge | The total message batches (entries) read from the storage for this namespace (message batches / second). | +| pulsar_subscription_delayed | Gauge | The total message batches (entries) are delayed for dispatching. | +| pulsar_storage_write_latency_le_* | Histogram | The entry rate of a namespace that the storage write latency is smaller with a given threshold.
Available thresholds:
| +| pulsar_entry_size_le_* | Histogram | The entry rate of a namespace that the entry size is smaller with a given threshold.
Available thresholds:
| + +#### Replication metrics + +If a namespace is configured to be replicated between multiple Pulsar clusters, the corresponding replication metrics will also be exposed when `replicationMetricsEnabled` is enabled. + +All the replication metrics will also be labelled with `remoteCluster=${pulsar_remote_cluster}`. + +| Name | Type | Description | +|---|---|---| +| pulsar_replication_rate_in | Gauge | The total message rate of the namespace replicating from remote cluster (messages/second). | +| pulsar_replication_rate_out | Gauge | The total message rate of the namespace replicating to remote cluster (messages/second). | +| pulsar_replication_throughput_in | Gauge | The total throughput of the namespace replicating from remote cluster (bytes/second). | +| pulsar_replication_throughput_out | Gauge | The total throughput of the namespace replicating to remote cluster (bytes/second). | +| pulsar_replication_backlog | Gauge | The total backlog of the namespace replicating to remote cluster (messages). | + +### Topic metrics + +> Topic metrics are only exposed when `exposeTopicLevelMetricsInPrometheus` is set to true. + +All the topic metrics are labelled with the following labels: + +- *cluster*: `cluster=${pulsar_cluster}`. `${pulsar_cluster}` is the cluster name that you configured in `broker.conf`. +- *namespace*: `namespace=${pulsar_namespace}`. `${pulsar_namespace}` is the namespace name. +- *topic*: `topic=${pulsar_topic}`. `${pulsar_topic}` is the topic name. + +| Name | Type | Description | +|---|---|---| +| pulsar_subscriptions_count | Gauge | The number of Pulsar subscriptions of the topic served by this broker. | +| pulsar_producers_count | Gauge | The number of active producers of the topic connected to this broker. | +| pulsar_consumers_count | Gauge | The number of active consumers of the topic connected to this broker. | +| pulsar_rate_in | Gauge | The total message rate of the topic coming into this broker (messages/second). | +| pulsar_rate_out | Gauge | The total message rate of the topic going out from this broker (messages/second). | +| pulsar_throughput_in | Gauge | The total throughput of the topic coming into this broker (bytes/second). | +| pulsar_throughput_out | Gauge | The total throughput of the topic going out from this broker (bytes/second). | +| pulsar_storage_size | Gauge | The total storage size of the topics in this topic owned by this broker (bytes). | +| pulsar_storage_backlog_size | Gauge | The total backlog size of the topics of this topic owned by this broker (messages). | +| pulsar_storage_offloaded_size | Gauge | The total amount of the data in this topic offloaded to the tiered storage (bytes). | +| pulsar_storage_backlog_quota_limit | Gauge | The total amount of the data in this topic that limit the backlog quota (bytes). | +| pulsar_storage_write_rate | Gauge | The total message batches (entries) written to the storage for this topic (message batches / second). | +| pulsar_storage_read_rate | Gauge | The total message batches (entries) read from the storage for this topic (message batches / second). | +| pulsar_subscription_delayed | Gauge | The total message batches (entries) are delayed for dispatching. | +| pulsar_storage_write_latency_le_* | Histogram | The entry rate of a topic that the storage write latency is smaller with a given threshold.
Available thresholds:
| +| pulsar_entry_size_le_* | Histogram | The entry rate of a topic that the entry size is smaller with a given threshold.
Available thresholds:
| +| pulsar_in_bytes_total | Counter | The total number of bytes received for this topic | +| pulsar_producers_count | Counter | The total number of messages received for this topic | + +#### Replication metrics + +If a namespace that a topic belongs to is configured to be replicated between multiple Pulsar clusters, the corresponding replication metrics will also be exposed when `replicationMetricsEnabled` is enabled. + +All the replication metrics will also be labelled with `remoteCluster=${pulsar_remote_cluster}`. + +| Name | Type | Description | +|---|---|---| +| pulsar_replication_rate_in | Gauge | The total message rate of the topic replicating from remote cluster (messages/second). | +| pulsar_replication_rate_out | Gauge | The total message rate of the topic replicating to remote cluster (messages/second). | +| pulsar_replication_throughput_in | Gauge | The total throughput of the topic replicating from remote cluster (bytes/second). | +| pulsar_replication_throughput_out | Gauge | The total throughput of the topic replicating to remote cluster (bytes/second). | +| pulsar_replication_backlog | Gauge | The total backlog of the topic replicating to remote cluster (messages). | + + +### Subscription metrics + +> Subscription metrics are only exposed when `exposeTopicLevelMetricsInPrometheus` is set to true. + +All the subscription metrics are labelled with the following labels: + +- *cluster*: `cluster=${pulsar_cluster}`. `${pulsar_cluster}` is the cluster name that you configured in `broker.conf`. +- *namespace*: `namespace=${pulsar_namespace}`. `${pulsar_namespace}` is the namespace name. +- *topic*: `topic=${pulsar_topic}`. `${pulsar_topic}` is the topic name. +- *subscription*: `subscription=${subscription}`. `${subscription}` is the topic subscription name. + +| Name | Type | Description | +|---|---|---| +| pulsar_subscription_back_log | Gauge | The total backlog of a subscription (messages). | +| pulsar_subscription_delayed | Gauge | The total number of messages are delayed to be dispatched for a subscription (messages). | +| pulsar_subscription_msg_rate_redeliver | Gauge | The total message rate for message being redelivered (messages/second). | +| pulsar_subscription_unacked_messages | Gauge | The total number of unacknowledged messages of a subscription (messages). | +| pulsar_subscription_blocked_on_unacked_messages | Gauge | Indicate whether a subscription is blocked on unacknowledged messages or not.
| +| pulsar_subscription_msg_rate_out | Gauge | The total message dispatch rate for a subscription (messages/second). | +| pulsar_subscription_msg_throughput_out | Gauge | The total message dispatch throughput for a subscription (bytes/second). | + +### Consumer metrics + +> Consumer metrics are only exposed when both `exposeTopicLevelMetricsInPrometheus` and `exposeConsumerLevelMetricsInPrometheus` +> are set to true. + +All the consumer metrics are labelled with the following labels: + +- *cluster*: `cluster=${pulsar_cluster}`. `${pulsar_cluster}` is the cluster name that you configured in `broker.conf`. +- *namespace*: `namespace=${pulsar_namespace}`. `${pulsar_namespace}` is the namespace name. +- *topic*: `topic=${pulsar_topic}`. `${pulsar_topic}` is the topic name. +- *subscription*: `subscription=${subscription}`. `${subscription}` is the topic subscription name. +- *consumer_name*: `consumer_name=${consumer_name}`. `${consumer_name}` is the topic consumer name. +- *consumer_id*: `consumer_id=${consumer_id}`. `${consumer_id}` is the topic consumer id. + +| Name | Type | Description | +|---|---|---| +| pulsar_consumer_msg_rate_redeliver | Gauge | The total message rate for message being redelivered (messages/second). | +| pulsar_consumer_unacked_messages | Gauge | The total number of unacknowledged messages of a consumer (messages). | +| pulsar_consumer_blocked_on_unacked_messages | Gauge | Indicate whether a consumer is blocked on unacknowledged messages or not.
| +| pulsar_consumer_msg_rate_out | Gauge | The total message dispatch rate for a consumer (messages/second). | +| pulsar_consumer_msg_throughput_out | Gauge | The total message dispatch throughput for a consumer (bytes/second). | +| pulsar_consumer_available_permits | Gauge | The available permits for for a consumer. | + +## Monitor + +You can [set up a Prometheus instance](https://prometheus.io/) to collect all the metrics exposed at Pulsar components and set up +[Grafana](https://grafana.com/) dashboards to display the metrics and monitor your Pulsar cluster. + +The following are some Grafana dashboards examples: + +- [pulsar-grafana](http://pulsar.apache.org/docs/en/deploy-monitoring/#grafana): A grafana dashboard that displays metrics collected in Prometheus for Pulsar clusters running on Kubernetes. +- [apache-pulsar-grafana-dashboard](https://github.com/streamnative/apache-pulsar-grafana-dashboard): A collection of grafana dashboard templates for different Pulsar components running on both Kubernetes and on-premise machines. diff --git a/site2/website/versioned_docs/version-2.4.1/reference-configuration.md b/site2/website/versioned_docs/version-2.4.1/reference-configuration.md index 7fde456503223..ec0291db59e4b 100644 --- a/site2/website/versioned_docs/version-2.4.1/reference-configuration.md +++ b/site2/website/versioned_docs/version-2.4.1/reference-configuration.md @@ -5,6 +5,7 @@ sidebar_label: Pulsar configuration original_id: reference-configuration --- +