cortex-jsonnet 1.10.0
This release has been created long after Cortex 1.10.0 was released and it's branched from the commit 629d288, which was the last grafana/cortex-jsonnet
commit before Cortex 1.10.0 was released.
1.10.0 / 2021-12-30
- [CHANGE]
namespace
template variable in dashboards now only selects namespaces for selected clusters. #311 - [CHANGE] Alertmanager: mounted overrides configmap to alertmanager too. #315
- [CHANGE] Memcached: upgraded memcached from
1.5.17
to1.6.9
. #316 - [CHANGE]
CortexIngesterRestarts
alert severity changed fromcritical
towarning
. #321 - [CHANGE] Store-gateway: increased memory request and limit respectively from 6GB / 6GB to 12GB / 18GB. #322
- [CHANGE] Store-gateway: increased
-blocks-storage.bucket-store.max-chunk-pool-bytes
from 2GB (default) to 12GB. #322 - [CHANGE] Dashboards: added overridable
job_labels
andcluster_labels
to the configuration object as label lists to uniquely identify jobs and clusters in the metric names and group-by lists in dashboards. #319 - [CHANGE] Dashboards:
alert_aggregation_labels
has been removed from the configuration and overriding this value has been deprecated. Instead the labels are now defined by thecluster_labels
list, and should be overridden accordingly through that list. #319 - [CHANGE] Ingester/Ruler: set
-server.grpc-max-send-msg-size-bytes
and-server.grpc-max-send-msg-size-bytes
to sensible default values (10MB). #326 - [CHANGE] Renamed
CortexCompactorHasNotUploadedBlocksSinceStart
toCortexCompactorHasNotUploadedBlocks
. #334 - [CHANGE] Renamed
CortexCompactorRunFailed
toCortexCompactorHasNotSuccessfullyRunCompaction
. #334 - [CHANGE] Renamed
CortexInconsistentConfig
alert toCortexInconsistentRuntimeConfig
and increased severity tocritical
. #335 - [CHANGE] Increased
CortexBadRuntimeConfig
alert severity tocritical
and removed support forcortex_overrides_last_reload_successful
metric (was removed in Cortex 1.3.0). #335 - [CHANGE] Grafana 'min step' changed to 15s so dashboard show better detail. #340
- [CHANGE] Replace
CortexRulerFailedEvaluations
with two new alerts:CortexRulerTooManyFailedPushes
andCortexRulerTooManyFailedQueries
. #347 - [CHANGE] Removed
CortexCacheRequestErrors
alert. This alert was not working because the legacy Cortex cache client instrumentation doesn't track errors. #346 - [CHANGE] Removed
CortexQuerierCapacityFull
alert. #342 - [CHANGE] Changes blocks storage alerts to group metrics by the configured
cluster_labels
(supporting the deprecatedalert_aggregation_labels
). #351 - [CHANGE] Increased
CortexIngesterReachingSeriesLimit
critical alert threshold from 80% to 85%. #363 - [ENHANCEMENT] cortex-mixin: Make
cluster_namespace_deployment:kube_pod_container_resource_requests_{cpu_cores,memory_bytes}:sum
backwards compatible withkube-state-metrics
v2.0.0. #317 - [ENHANCEMENT] Cortex-mixin: Include
cortex-gw-internal
naming variation in defaultgateway
job names. #328 - [ENHANCEMENT] Ruler dashboard: added object storage metrics. #354
- [ENHANCEMENT] Alertmanager dashboard: added object storage metrics. #354
- [ENHANCEMENT] Added documentation text panels and descriptions to reads and writes dashboards. #324
- [ENHANCEMENT] Dashboards: defined container functions for common resources panels: containerDiskWritesPanel, containerDiskReadsPanel, containerDiskSpaceUtilization. #331
- [ENHANCEMENT] cortex-mixin: Added
alert_excluded_routes
config to exclude specific routes from alerts. #338 - [ENHANCEMENT] Added
CortexMemcachedRequestErrors
alert. #346 - [ENHANCEMENT] Ruler dashboard: added "Per route p99 latency" panel in the "Configuration API" row. #353
- [ENHANCEMENT] Increased the
for
duration of theCortexIngesterReachingSeriesLimit
warning alert to 3h. #362 - [ENHANCEMENT] Added a new tier (
medium_small_user
) so we have another tier between 100K and 1Mil active series. #364 - [ENHANCEMENT] Extend Alertmanager dashboard: #313
- "Tenants" stat panel - shows number of discovered tenant configurations.
- "Replication" row - information about the replication of tenants/alerts/silences over instances.
- "Tenant Configuration Sync" row - information about the configuration sync procedure.
- "Sharding Initial State Sync" row - information about the initial state sync procedure when sharding is enabled.
- "Sharding Runtime State Sync" row - information about various state operations which occur when sharding is enabled (replication, fetch, marge, persist).
- [BUGFIX] Fixed
CortexIngesterHasNotShippedBlocks
alert false positive in case an ingester instance had ingested samples in the past, then no traffic was received for a long period and then it started receiving samples again. #308 - [BUGFIX] Alertmanager: fixed
--alertmanager.cluster.peers
CLI flag passed to alertmanager when HA is enabled. #329 - [BUGFIX] Fixed
CortexInconsistentRuntimeConfig
metric. #335 - [BUGFIX] Fixed scaling dashboard to correctly work when a Cortex service deployment spans across multiple zones (a zone is expected to have the
zone-[a-z]
suffix). #365 - [BUGFIX] Fixed rollout progress dashboard to correctly work when a Cortex service deployment spans across multiple zones (a zone is expected to have the
zone-[a-z]
suffix). #366