Skip to content

Commit

Permalink
Pages to /introduction needed redirects
Browse files Browse the repository at this point in the history
  • Loading branch information
rfraposa committed Apr 4, 2022
1 parent 121ef70 commit 4530d8e
Show file tree
Hide file tree
Showing 19 changed files with 432 additions and 759 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ node_modules
.docusaurus
build
docs/en/
docs/ru/
docs/zh/
**/.DS_Store
backup
yarn.lock
2 changes: 1 addition & 1 deletion docs/guides/sre/configuring-ldap.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ sidebar_label: Configuring LDAP
sidebar_position: 20
---

# Configuring ClickHouse to use LDAP for authentication and role mapping
# Configuring ClickHouse to Use LDAP for Authentication and Role Mapping

ClickHouse can be configured to use LDAP to authenticate ClickHouse database users. This guide provides a simple example of integrating ClickHouse with an LDAP system authenticating to a publicly available directory.

Expand Down
18 changes: 10 additions & 8 deletions docs/guides/sre/scaling-clusters.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
---
sidebar_label: Managing Clusters
sidebar_label: Rebalancing Shards
sidebar_position: 20
description: Topics concerning cluster management
description: ClickHouse does not support automatic shard rebalancing, so we provide some best practices for how to rebalance shards.
---

# Managing Clusters

## Rebalancing Data
# Rebalancing Data

ClickHouse does not support automatic shard rebalancing. However, there are ways to rebalance shards in order of preference:

1. Adjust the shard for the [distributed table](https://clickhouse.com/docs/en/engines/table-engines/special/distributed/), allowing writes to be biased to the new shard. This potentially will cause load imbalances and hot spots on the cluster but can be viable in most scenarios where write throughput is not extremely high. It does not require the user to change their write target i.e. It can remain as the distributed table. This does not assist with rebalancing existing data.
1. Adjust the shard for the [distributed table](../../en/engines/table-engines/special/distributed/), allowing writes to be biased to the new shard. This potentially will cause load imbalances and hot spots on the cluster but can be viable in most scenarios where write throughput is not extremely high. It does not require the user to change their write target i.e. It can remain as the distributed table. This does not assist with rebalancing existing data.

2. As an alternative to (1), modify the existing cluster and write exclusively to the new shard until the cluster is balanced - manually weighting writes. This has the same limitations as (1).

3. If you need to rebalance existing data and you have partitioned your data, consider detaching partitions and manually relocating them to another node before reattaching to the new shard. This is more manual than subsequent techniques but may be faster and less resource-intensive. This is a manual operation and thus needs to consider the rebalancing of the data.
4. Create a new cluster with the new topology and copy the data using [ClickHouse Copier](https://clickhouse.com/docs/en/operations/utilities/clickhouse-copier/). Alternatively, create a new database within the existing cluster and migrate the data using ClickHouse Copier. This can be potentially computationally expensive and may impact your production environment. Building a new cluster on separate hardware, and applying this technique, is an option to mitigate this at the expense of cost.
5. Export the data from the source cluster to the new cluster via an[ INSERT FROM SELECT](https://clickhouse.com/docs/en/sql-reference/statements/insert-into/#insert_query_insert-select). This will not be performant on very large datasets and will potentially incur significant IO on the source cluster and use considerable network resources. This represents a last resort.

4. Create a new cluster with the new topology and copy the data using [ClickHouse Copier](../../en/operations/utilities/clickhouse-copier.md). Alternatively, create a new database within the existing cluster and migrate the data using ClickHouse Copier. This can be potentially computationally expensive and may impact your production environment. Building a new cluster on separate hardware, and applying this technique, is an option to mitigate this at the expense of cost.

5. Export the data from the source cluster to the new cluster via an[ INSERT FROM SELECT](../../en/sql-reference/statements/insert-into/#insert_query_insert-select). This will not be performant on very large datasets and will potentially incur significant IO on the source cluster and use considerable network resources. This represents a last resort.

There is an internal effort to reconsider how rebalancing could be implemented. There is some relevant discussion [here](https://github.com/ClickHouse/ClickHouse/issues/13574).
1 change: 1 addition & 0 deletions docs/integrations/_category_.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ collapsed: true
link:
type: generated-index
title: Integrations
slug: /integrations
36 changes: 19 additions & 17 deletions docs/integrations/s3/s3-merge-tree.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,28 @@
---
sidebar_label: S3 Backed Merge Tree
sidebar_label: S3 Backed MergeTree
sidebar_position: 4
description: S3 Backed Merge Tree
description: S3 Backed MergeTree
---

# S3 Backed Merge Tree
# S3 Backed MergeTree

*This feature is currently experimental and undergoing improvements and experimentation to understand performance.*
:::note
This feature is currently experimental and undergoing improvements and experimentation to understand performance.
:::

The s3 functions and associated table engine allow us to query data in s3 using familiar ClickHouse syntax. However, concerning data management features and performance, they are limited. There is no support for primary indexes, no-cache support, and files inserts need to be managed by the user.
The `s3` functions and associated table engine allow us to query data in S3 using familiar ClickHouse syntax. However, concerning data management features and performance, they are limited. There is no support for primary indexes, no-cache support, and files inserts need to be managed by the user.

ClickHouse recognizes that S3 represents an attraction storage operation: especially where query performance on “colder” data is less critical, and users seek to separate storage and compute. To help achieve this, support is provided for using s3 as the storage for a Merge Tree engine. This will enable users to exploit the scalability and cost of benefits of s3 and the insert and query performance of the Merge Tree engine.
ClickHouse recognizes that S3 represents an attraction storage operation: especially where query performance on “colder” data is less critical, and users seek to separate storage and compute. To help achieve this, support is provided for using S3 as the storage for a MergeTree engine. This will enable users to exploit the scalability and cost of benefits of S3 and the insert and query performance of the MergeTree engine.

## Storage Tiers

ClickHouse storage volumes allow physical disks to be abstracted from the Merge Tree table engine. Any single volume can be composed of an ordered set of disks. Whilst principally allowing multiple block devices to be potentially used for data storage, this abstraction also allows other storage types, including s3. ClickHouse data parts can be moved between volumes and fill rates according to storage policies, thus creating the concept of storage tiers.
ClickHouse storage volumes allow physical disks to be abstracted from the MergeTree table engine. Any single volume can be composed of an ordered set of disks. Whilst principally allowing multiple block devices to be potentially used for data storage, this abstraction also allows other storage types, including S3. ClickHouse data parts can be moved between volumes and fill rates according to storage policies, thus creating the concept of storage tiers.

Storage tiers unlock hot-cold architectures where the most recent data, which is typically also the most queried, requires only a small amount of space on high-performing storage, e.g., NVMe SSDs. As the data ages, SLAs for query times increase, as does query frequency. This fat tail of data can be stored on slower, less performant storage such as HDD or object storage such as s3.
Storage tiers unlock hot-cold architectures where the most recent data, which is typically also the most queried, requires only a small amount of space on high-performing storage, e.g., NVMe SSDs. As the data ages, SLAs for query times increase, as does query frequency. This fat tail of data can be stored on slower, less performant storage such as HDD or object storage such as S3.

## Creating a Disk

To utilize an s3 bucket as a disk, we must first declare it within the ClickHouse configuration file. Either extend config.xml or preferably provide a new file under conf.d. An example of an s3 disk declaration is shown below:
To utilize an S3 bucket as a disk, we must first declare it within the ClickHouse configuration file. Either extend config.xml or preferably provide a new file under conf.d. An example of an S3 disk declaration is shown below:

```xml
<clickhouse>
Expand Down Expand Up @@ -140,35 +142,35 @@ Occasionally users may need to modify the storage policy of a specific table. Wh
ALTER TABLE trips_s3 MODIFY SETTING storage_policy='s3_tiered'
```

Here we reuse the main volume in our new s3_tiered policy and introduce a new hot volume. This uses the default disk, which consists of only one disk configured via the parameter `<path>`. Note that our volume names and disks do not change. New inserts to our table will reside on the default disk until this reaches move_factor * disk_size - at which data will be relocated to s3.
Here we reuse the main volume in our new s3_tiered policy and introduce a new hot volume. This uses the default disk, which consists of only one disk configured via the parameter `<path>`. Note that our volume names and disks do not change. New inserts to our table will reside on the default disk until this reaches move_factor * disk_size - at which data will be relocated to S3.

## Handling Replication

For traditional disk-backed tables, we rely on ClickHouse to handle data replication via the ReplicatedTableEngine. Whilst for s3, this replication is inherently handled at the storage layer, local files are still held for the table on disk. Specifically, ClickHouse stores metadata data files on disk (see [Internals](#internals)) for further details. These files will be replicated if using a ReplicatedMergeTree in a process known as [Zero Copy Replication](https://clickhouse.com/docs/en/operations/storing-data/#zero-copy). This is enabled by default through the setting allow_remote_fs_zero_copy_replication. This is best illustrated below where the table exists on 2 ClickHouse nodes:
For traditional disk-backed tables, we rely on ClickHouse to handle data replication via the ReplicatedTableEngine. Whilst for S3, this replication is inherently handled at the storage layer, local files are still held for the table on disk. Specifically, ClickHouse stores metadata data files on disk (see [Internals](#internals)) for further details. These files will be replicated if using a ReplicatedMergeTree in a process known as [Zero Copy Replication](https://clickhouse.com/docs/en/operations/storing-data/#zero-copy). This is enabled by default through the setting allow_remote_fs_zero_copy_replication. This is best illustrated below where the table exists on 2 ClickHouse nodes:


<img src={require('./images/s3_01.png').default} class="image" alt="Replicating S3 backed Merge Tree" style={{width: '80%'}}/>
<img src={require('./images/s3_01.png').default} class="image" alt="Replicating S3 backed MergeTree" style={{width: '80%'}}/>

## Internals

## Read & Writes

The following notes cover the implementation of s3 interactions with ClickHouse. Whilst generally only informative, it may help the readers when [Optimizing for Performance](./s3-optimizing-performance):
The following notes cover the implementation of S3 interactions with ClickHouse. Whilst generally only informative, it may help the readers when [Optimizing for Performance](./s3-optimizing-performance):

* By default, the maximum number of query processing threads used by any stage of the query processing pipeline is equal to the number of cores. Some stages are more parallelizable than others, so this value provides an upper bound. Multiple query stages may execute at once since data is streamed from the disk. The exact number of threads used for a query may thus exceed this. Modify through the setting [max_threads](https://clickhouse.com/docs/en/operations/settings/settings/#settings-max_threads).
* Reads on s3 are asynchronous by default. This behavior is determined by setting `remote_filesystem_read_method`, set to the value “threadpool” by default. When serving a request, ClickHouse reads granules in stripes. Each of these stripes potentially contain many columns. A thread will read the columns for their granules one by one. Rather than doing this synchronously, a prefetch is made for all columns before waiting for the data. This offers significant performance improvements over synchronous waits on each column. Users will not need to change this setting in most cases - see [Optimizing for Performance](./s3-optimizing-performance).
* Writes are performed in parallel, with a maximum of 100 concurrent file writing threads. `max_insert_delayed_streams_for_parallel_write`, which has a default value of 1000, controls the number of s3 blobs written in parallel. Since a buffer is required for each file being written (~1MB), this effectively limits the memory consumption of an INSERT. It may be appropriate to lower this value in low server memory scenarios.
* Reads on S3 are asynchronous by default. This behavior is determined by setting `remote_filesystem_read_method`, set to the value “threadpool” by default. When serving a request, ClickHouse reads granules in stripes. Each of these stripes potentially contain many columns. A thread will read the columns for their granules one by one. Rather than doing this synchronously, a prefetch is made for all columns before waiting for the data. This offers significant performance improvements over synchronous waits on each column. Users will not need to change this setting in most cases - see [Optimizing for Performance](./s3-optimizing-performance).
* Writes are performed in parallel, with a maximum of 100 concurrent file writing threads. `max_insert_delayed_streams_for_parallel_write`, which has a default value of 1000, controls the number of S3 blobs written in parallel. Since a buffer is required for each file being written (~1MB), this effectively limits the memory consumption of an INSERT. It may be appropriate to lower this value in low server memory scenarios.


For further information on tuning threads, see [Optimizing for Performance](./s3-optimizing-performance).

Important: as of 22.3.1, there are two settings to enable the cache. `data_cache_enabled` and `cache_enabled`. The former enables the new cache, which supports the eviction of index files. The latter setting enables a legacy cache. As of 22.3.1, we recommend enabling both settings.

To accelerate reads, s3 files are cached on the local filesystem by breaking files into segments. Any contiguous read segments are saved in the cache, with overlapping segments reused. Writes resulting from INSERTs or merges can also optionally be stored in the cache. Where possible, the cache is reused for file reads. ClickHouse’s linear reads lend themselves to this caching strategy. Should a contiguous read result in a cache miss, the segment is downloaded and cached. Eviction occurs on an LRU basis per segment. The removal of a file also causes its removal from the cache.
To accelerate reads, S3 files are cached on the local filesystem by breaking files into segments. Any contiguous read segments are saved in the cache, with overlapping segments reused. Writes resulting from INSERTs or merges can also optionally be stored in the cache. Where possible, the cache is reused for file reads. ClickHouse’s linear reads lend themselves to this caching strategy. Should a contiguous read result in a cache miss, the segment is downloaded and cached. Eviction occurs on an LRU basis per segment. The removal of a file also causes its removal from the cache.

The metadata for the cache (entries and last used time) is held in memory for fast access. On restarts of ClickHouse, this metadata is reconstructed from the files on disk with the loss of the last used time. In this case, the value is set to 0, causing random eviction until the values are fully populated.

The max cache size can be specified in bytes through the setting `max_cache_size`. This defaults to 1GB (subject to change). Index and mark files can be evicted from the cache. The FS page cache can efficiently cache all files.

Finally, merges on data residing in s3 are potentially a performant bottleneck if not performed intelligently. Cached versions of files minimize merges performed directly on the remote storage. By default, this is enabled and can be turned off by setting `read_from_cache_if_exists_orthersize_bypass_cache` to 0.
Finally, merges on data residing in S3 are potentially a performant bottleneck if not performed intelligently. Cached versions of files minimize merges performed directly on the remote storage. By default, this is enabled and can be turned off by setting `read_from_cache_if_exists_orthersize_bypass_cache` to 0.

2 changes: 1 addition & 1 deletion docs/integrations/s3/s3-minio.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@ description: Using MinIO

# Using MinIO

All s3 functions and tables and compatible with [MinIO](https://min.io/). Users may experience superior throughput on self-hosted MinIO stores, especially in the event of optimal network locality.
All S3 functions and tables and compatible with [MinIO](https://min.io/). Users may experience superior throughput on self-hosted MinIO stores, especially in the event of optimal network locality.


Loading

0 comments on commit 4530d8e

Please sign in to comment.