Skip to content

Commit

Permalink
S3 updates
Browse files Browse the repository at this point in the history
  • Loading branch information
gingerwizard committed Apr 8, 2022
1 parent dff39c0 commit 0886746
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 8 deletions.
2 changes: 1 addition & 1 deletion docs/integrations/kafka/kafka-connect-http.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ The instructions for creating an HTTP Sink in Confluent Cloud can be found [here
* `Input messages` - Depends on your source data but in most cases JSON or Avro. We assume JSON in the following settings.
* `Kafka Cluster credentials` - Confluent cloud allows you to generate these for the appropriate topic from which you wish to pull messages.
* HTTP server details - The connection details for ClickHouse. Specifically:
* `HTTP Url` - This should be of the same format as the self-managed configuration parameter http.api.url i.e. <protocol>://<clickhouse_host>:<clickhouse_port>?query=INSERT%20INTO%20<database>.<table>%20FORMAT%20JSONEachRow
* `HTTP Url` - This should be of the same format as the self-managed configuration parameter `http.api.url i.e. <protocol>://<clickhouse_host>:<clickhouse_port>?query=INSERT%20INTO%20<database>.<table>%20FORMAT%20JSONEachRow`
* `HTTP Request Method` - Set to POST
* `HTTP Headers` - “Content Type: application/json”
* HTTP server batches
Expand Down
10 changes: 6 additions & 4 deletions docs/integrations/s3/s3-merge-tree.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,19 +158,21 @@ For traditional disk-backed tables, we rely on ClickHouse to handle data replica
The following notes cover the implementation of S3 interactions with ClickHouse. Whilst generally only informative, it may help the readers when [Optimizing for Performance](./s3-optimizing-performance):

* By default, the maximum number of query processing threads used by any stage of the query processing pipeline is equal to the number of cores. Some stages are more parallelizable than others, so this value provides an upper bound. Multiple query stages may execute at once since data is streamed from the disk. The exact number of threads used for a query may thus exceed this. Modify through the setting [max_threads](https://clickhouse.com/docs/en/operations/settings/settings/#settings-max_threads).
* Reads on S3 are asynchronous by default. This behavior is determined by setting `remote_filesystem_read_method`, set to the value “threadpool” by default. When serving a request, ClickHouse reads granules in stripes. Each of these stripes potentially contain many columns. A thread will read the columns for their granules one by one. Rather than doing this synchronously, a prefetch is made for all columns before waiting for the data. This offers significant performance improvements over synchronous waits on each column. Users will not need to change this setting in most cases - see [Optimizing for Performance](./s3-optimizing-performance).
* Reads on S3 are asynchronous by default. This behavior is determined by setting `remote_filesystem_read_method`, set to the value `threadpool` by default. When serving a request, ClickHouse reads granules in stripes. Each of these stripes potentially contain many columns. A thread will read the columns for their granules one by one. Rather than doing this synchronously, a prefetch is made for all columns before waiting for the data. This offers significant performance improvements over synchronous waits on each column. Users will not need to change this setting in most cases - see [Optimizing for Performance](./s3-optimizing-performance).
* For the s3 function and table, parallel downloading is determined by the values `max_download_threads` and `max_download_buffer_size`. Files will only be downloaded in parallel if their size is greater than the total buffer size combined across all threads. This is only available on versions > 22.3.1.
* Writes are performed in parallel, with a maximum of 100 concurrent file writing threads. `max_insert_delayed_streams_for_parallel_write`, which has a default value of 1000, controls the number of S3 blobs written in parallel. Since a buffer is required for each file being written (~1MB), this effectively limits the memory consumption of an INSERT. It may be appropriate to lower this value in low server memory scenarios.


For further information on tuning threads, see [Optimizing for Performance](./s3-optimizing-performance).

Important: as of 22.3.1, there are two settings to enable the cache. `data_cache_enabled` and `cache_enabled`. The former enables the new cache, which supports the eviction of index files. The latter setting enables a legacy cache. As of 22.3.1, we recommend enabling both settings.
Important: as of 22.3.1, there are two settings to enable the cache `data_cache_enabled` and `enable_filesystem_cache`. We recommend setting both of these 1 to enable the new cache behavior described, which supports the eviction of index files. To disable the eviction of index and mark files from the cache, we also recommend setting `cache_enabled=1`.

To accelerate reads, S3 files are cached on the local filesystem by breaking files into segments. Any contiguous read segments are saved in the cache, with overlapping segments reused. Writes resulting from INSERTs or merges can also optionally be stored in the cache. Where possible, the cache is reused for file reads. ClickHouse’s linear reads lend themselves to this caching strategy. Should a contiguous read result in a cache miss, the segment is downloaded and cached. Eviction occurs on an LRU basis per segment. The removal of a file also causes its removal from the cache.
To accelerate reads, s3 files are cached on the local filesystem by breaking files into segments. Any contiguous read segments are saved in the cache, with overlapping segments reused. Later versions will optionally allow writes resulting from INSERTs or merges to be stored in the cache via the option `enable_filesystem_cache_on_write_operations`. Where possible, the cache is reused for file reads. ClickHouse’s linear reads lend themselves to this caching strategy. Should a contiguous read result in a cache miss, the segment is downloaded and cached. Eviction occurs on an LRU basis per segment. The removal of a file also causes its removal from the cache. The setting `read_from_cache_if_exists_otherwise_bypass_cache` can be set to 1 for specific queries which you know are not cache efficient. These queries might be known to be unfriendly to the cache and result in heavy evictions.

The metadata for the cache (entries and last used time) is held in memory for fast access. On restarts of ClickHouse, this metadata is reconstructed from the files on disk with the loss of the last used time. In this case, the value is set to 0, causing random eviction until the values are fully populated.

The max cache size can be specified in bytes through the setting `max_cache_size`. This defaults to 1GB (subject to change). Index and mark files can be evicted from the cache. The FS page cache can efficiently cache all files.

Finally, merges on data residing in S3 are potentially a performant bottleneck if not performed intelligently. Cached versions of files minimize merges performed directly on the remote storage. By default, this is enabled and can be turned off by setting `read_from_cache_if_exists_orthersize_bypass_cache` to 0.
Enabling the cache can speed up first-time queries for which the data is not resident in the cache. If a query needs to re-access data that has been cached as part of its execution, the fs page cache can be utilized - thus avoiding re-reads from s3.

Finally, merges on data residing in s3 are potentially a performant bottleneck if not performed intelligently. Cached versions of files minimize merges performed directly on the remote storage.
6 changes: 3 additions & 3 deletions docs/integrations/s3/s3-optimizing-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ description: Optimizing S3 Performance with ClickHouse

## Measuring Performance

Before making any changes to improve performance, ensure you measure appropriately. As S3 API calls are sensitive to latency and may impact client timings, use the query log for performance metrics, i.e., system.query_log. For further details on how to analyze query performance, see here.
Before making any changes to improve performance, ensure you measure appropriately. As S3 API calls are sensitive to latency and may impact client timings, use the query log for performance metrics, i.e., system.query_log.

If measuring the performance of SELECT queries, where large volumes of data are returned to the client, either utilize the [null format](https://clickhouse.com/docs/en/interfaces/formats/#null) for queries or direct results to the [Null engine](https://clickhouse.com/docs/en/engines/table-engines/special/null/). This should avoid the client being overwhelmed with data and network saturation.

Expand Down Expand Up @@ -39,11 +39,11 @@ ClickHouse can read files stored in s3 buckets in the [supported formats](https:
* Formats such as native or parquet do not typically justify the overhead of compression. Any savings in data size are likely to be minimal since these formats are inherently compact. The time spent compressing and decompressing will rarely offset network transfer times - especially since s3 is globally available with higher network bandwidth.


Internally the ClickHouse merge tree uses two primary storage formats: [Wide and Compact](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/#mergetree-data-storage). Whilst the current implementation uses the default behavior of ClickHouse - controlled through the settings `min_bytes_for_wide_part` and `min_rows_for_wide_part`; we expect behavior to diverge for s3 in the future releases, e.g., a larger default value of min_bytes_for_wide_part encouraging a more Compact format and thus fewer files. Users may now wish to tune these settings when using exclusively s3 storage.
Internally the ClickHouse merge tree uses two primary storage formats: [Wide and Compact](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/#mergetree-data-storage). While the current implementation uses the default behavior of ClickHouse - controlled through the settings `min_bytes_for_wide_part` and `min_rows_for_wide_part`; we expect behavior to diverge for s3 in the future releases, e.g., a larger default value of min_bytes_for_wide_part encouraging a more Compact format and thus fewer files. Users may now wish to tune these settings when using exclusively s3 storage.

## Scaling with Nodes

Users will have often have more than one node of ClickHouse available. While users can scale vertically, improving s3 throughput linearly with the number of cores, horizontal scaling is often necessary due to hardware availability and cost-efficiency.
Users will often have more than one node of ClickHouse available. While users can scale vertically, improving s3 throughput linearly with the number of cores, horizontal scaling is often necessary due to hardware availability and cost-efficiency.

The replication of an s3 backed Merge Tree is supported through zero copy replication.

Expand Down

0 comments on commit 0886746

Please sign in to comment.