Merge branch 'main' into add-replicated-deployment

gjones · Apr 4, 2023 · 4d1b69b · 4d1b69b
2 parents 151a3ab + fa814d4
commit 4d1b69b
Show file tree

Hide file tree

Showing 11 changed files with 279 additions and 84 deletions.
diff --git a/docs/en/_snippets/_replication-sharding-terminology.md b/docs/en/_snippets/_replication-sharding-terminology.md
@@ -3,7 +3,7 @@
 A copy of data.  ClickHouse always has at least one copy of your data, and so the minimum number of **replicas** is one.  This is an important detail, you may not be used to counting the original copy of your data as a replica, but that is the term used in ClickHouse code and documentation.  Adding a second replica of your data provides fault tolerance. 
 
 ### Shard
-A subset of data.  ClickHouse always has at least on shard for your data, so if you do not split the data across multiple servers you have one shard.  Sharding data across multiple systems can be used to divide the load if you exceed the capacity of a single server. The destination server is determined by the **sharding key**, which can be random, or it can be determined when you create your table.  The deployment examples that use sharding will use `rand()` as the sharding key, and will provide further information on when and how to choose a different sharding key.
+A subset of data.  ClickHouse always has at least one shard for your data, so if you do not split the data across multiple servers, your data will be stored in one shard.  Sharding data across multiple servers can be used to divide the load if you exceed the capacity of a single server. The destination server is determined by the **sharding key**, and is defined when you create the distributed table. The sharding key can be random or as an output of a [hash function](https://clickhouse.com/docs/en/sql-reference/functions/hash-functions).  The deployment examples involving sharding will use `rand()` as the sharding key, and will provide further information on when and how to choose a different sharding key.
 
 ### Distributed coordination
 ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with Apache ZooKeeper.
diff --git a/docs/en/about-us/adopters.md b/docs/en/about-us/adopters.md
@@ -150,6 +150,7 @@ The following list of companies using ClickHouse and their success stories is as
 | [Lawrence Berkeley National Laboratory](https://www.lbl.gov) | Research | Traffic analysis | 5 servers | 55 TiB | [Slides in English, April 2019](https://www.smitasin.com/presentations/2019-04-17_DOE-NSM.pdf) |
 | [Lever](https://www.lever.co/) | Talent Management | Recruiting | - | - | [Hacker News post](https://news.ycombinator.com/item?id=29558544) |
 | [LifeStreet](https://lifestreet.com/) | Ad network | Main product | 75 servers (3 replicas) | 5.27 PiB | [Blog post in Russian, February 2017](https://habr.com/en/post/322620/) |
+| [Little Red Book (Xiaohongshu)](http://www.xiaohongshu.com/) | Social Media | Data warehouse | — | — | [Presentation, March 2023](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup71/LittleRedBook.pdf) |
 | [LogSnag](https://logsnag.com/) | Software & Technology | Realtime Monitoring | — | — | [Interview, December 2022](https://founderbeats.com/shayan-on-building-and-growing-logsnag-as-a-solo-founder) |
 | [Loja Integrada](https://lojaintegrada.com.br/) | E-Commerce | — | — | — | [Case Study, March 2023](https://double.cloud/resources/case-studies/lojaintegrada-and-pagali-switch-to-doublecloud-to-make-running-clickhouse-easier) |
 | [Lookforsale](https://lookforsale.ru/) | E-Commerce | — | — | — | [Job Posting, December 2021](https://telegram.me/javascript_jobs/587318) |
@@ -255,6 +256,7 @@ The following list of companies using ClickHouse and their success stories is as
 | [The Guild](https://the-guild.dev/) | API Platform | Monitoring | — | — | [Blog Post, November 2022](https://clickhouse.com/blog/100x-faster-graphql-hive-migration-from-elasticsearch-to-clickhouse) [Blog](https://the-guild.dev/blog/graphql-hive-and-clickhouse) |
 | [Tinybird](https://www.tinybird.co/) | Real-time Data Products | Data processing | — | — | [Official website](https://www.tinybird.co/) |
 | [Traffic Stars](https://trafficstars.com/) | AD network | — | 300 servers in Europe/US | 1.8 PiB, 700 000 insert rps (as of 2021) | [Slides in Russian, May 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup15/lightning/ninja.pdf) |
+| [Trip.com](https://trip.com/) | Travel Services | Logging | — | — | [Meetup, March 2023](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup71/Trip.com.pdf) |
 | [TURBOARD](https://www.turboard.com/) | BI Analytics | — | — | — | [Official website](https://www.turboard.com/blogs/clickhouse) |
 | [Uber](https://www.uber.com) | Taxi | Logging | — | — | [Slides, February 2020](https://presentations.clickhouse.com/meetup40/uber.pdf) |
 | [Uptrace](https://uptrace.dev/) | Software | Tracing Solution | — | — | [Official website, March 2021](https://uptrace.dev/open-source/) |
@@ -285,6 +287,7 @@ The following list of companies using ClickHouse and their success stories is as
 | [Your Analytics](https://www.your-analytics.org/) | Product Analytics | Main Product | — | - | [Tweet, November 2021](https://twitter.com/mikenikles/status/1459737241165565953) |
 | [YTsaurus](https://ytsaurus.tech/) | Distributed Storage and Processing | Main product | - | - | [Main website](https://ytsaurus.tech/) |
 | [Zagrava Trading](https://zagravagames.com/en/) | — | — | — | — | [Job offer, May 2021](https://twitter.com/datastackjobs/status/1394707267082063874) |
+| [Zerodah](https://zerodha.tech/) | Stock Broker | Logging | — | — | [Blog, March 2023](https://zerodha.tech/blog/logging-at-zerodha/) |
 | [ЦВТ](https://htc-cs.ru/) | Software Development | Metrics, Logging | — | — | [Blog Post, March 2019, in Russian](https://vc.ru/dev/62715-kak-my-stroili-monitoring-na-prometheus-clickhouse-i-elk) |
 | [МКБ](https://mkb.ru/) | Bank | Web-system monitoring | — | — | [Slides in Russian, September 2019](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup28/mkb.pdf) |
 | [ЦФТ](https://cft.ru/) | Banking, Financial products, Payments | — | — | — | [Meetup in Russian, April 2020](https://team.cft.ru/events/162) |

diff --git a/docs/en/about-us/cloud.md b/docs/en/about-us/cloud.md
@@ -17,3 +17,15 @@ Experience ClickHouse Cloud by [starting your free trial](https://clickhouse.clo
 - **Transparent pricing**: Pay only for what you use, with resource reservations and scaling controls.
 - **Total cost of ownership**: Best price / performance ratio and low administrative overhead.
 - **Broad ecosystem**: Bring your favorite data connectors, visualization tools, SQL and language clients with you.
+
+<div class='vimeo-container'>
+  <iframe src="https://player.vimeo.com/video/756877867?h=c58e171729"
+    width="640"
+    height="360"
+    frameborder="0"
+    allow="autoplay;
+    fullscreen;
+    picture-in-picture"
+    allowfullscreen>
+  </iframe>
+</div>
diff --git a/docs/en/deployment-guides/horizontal-scaling.md b/docs/en/deployment-guides/horizontal-scaling.md
@@ -26,7 +26,7 @@ This example architecture is designed to provide scalability.  It includes three
 |chnode3|Used for ClickHouse Keeper quorum|
 
 :::note
-In the more advanced configurations ClickHouse Keeper will be run on separate servers.  This basic configuration is running the Keeper functionality within the ClickHouse Server process.  As you scale out you may decide to separate the ClickHouse Servers from the Keeper servers.  The instructions for deploying ClickHouse Keeper standalone are available in the [installation documentation](/docs/en/getting-started/install.md/#install-standalone-clickhouse-keeper).
+In more advanced configurations, ClickHouse Keeper will be running on separate servers.  This basic configuration runs the Keeper functionality within the ClickHouse Server process.  As you scale out, you may decide to separate the ClickHouse Servers from the Keeper servers.  The instructions for deploying ClickHouse Keeper standalone are available in the [installation documentation](/docs/en/getting-started/install.md/#install-standalone-clickhouse-keeper).
 :::
 
 ## Install
@@ -39,11 +39,11 @@ Install Clickhouse on three servers following the [instructions for your archive
 
 ## chnode1 configuration
 
-For chnode1 there are five configuration files.  You may choose to combine these files into a single file, but for clarity in the documentation it may be simpler to look at them separately.  As you read through the configuration files you will see that most of the configuration is the same between chnode1 and chnode2; the differences will be highlighted.
+For chnode1, there are five configuration files.  You may choose to combine these files into a single file, but for clarity in the documentation it may be simpler to look at them separately.  As you read through the configuration files, you will see that most of the configuration is the same between chnode1 and chnode2; the differences will be highlighted.
 
 ### Network and logging configuration
 
-These values can be customized as you wish.  This example configuration gives you a debug log that will roll over at 1000M three times.  ClickHouse will listen on the IPV4 network on ports 8123 and 9000, and will use port 9009 for interserver communication.
+These values can be customized as you wish.  This example configuration gives you a debug log that will roll over at 1000M three times.  ClickHouse will listen on the IPv4 network on ports 8123 and 9000, and will use port 9009 for interserver communication.
 
 ```xml title="network-and-logging.xml on chnode1" 
 <clickhouse>
@@ -64,7 +64,7 @@ These values can be customized as you wish.  This example configuration gives yo
 
 ### ClickHouse Keeper configuration
 
-ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with Apache ZooKeeper.  This configuration enables ClickHouse Keeper on port 9181.  The highlighted line specifies that this instance of Keeper has server_id of 1.  This is the only difference in the `enable-keeper.xml` file across the three servers.  `chnode2` will have `server_id` set to `2`, and `chnode3` will have `server_id` set to `3`.  The raft configuration section is the same on all three servers, it is highlighted below to show you the raltionship between `server_id` and the `server` instance within the raft configuration.
+ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with Apache ZooKeeper.  This configuration enables ClickHouse Keeper on port 9181.  The highlighted line specifies that this instance of Keeper has `server_id` of 1.  This is the only difference in the `enable-keeper.xml` file across the three servers.  `chnode2` will have `server_id` set to `2`, and `chnode3` will have `server_id` set to `3`.  The raft configuration section is the same on all three servers, and it is highlighted below to show you the relationship between `server_id` and the `server` instance within the raft configuration.
 
 ```xml title="enable-keeper.xml on chnode1"
 <clickhouse>
@@ -86,18 +86,18 @@ ClickHouse Keeper provides the coordination system for data replication and dist
         <server>
             <id>1</id>
             <hostname>chnode1</hostname>
-            <port>9444</port>
+            <port>9234</port>
         </server>
     # highlight-end
         <server>
             <id>2</id>
             <hostname>chnode2</hostname>
-            <port>9444</port>
+            <port>9234</port>
         </server>
         <server>
             <id>3</id>
             <hostname>chnode3</hostname>
-            <port>9444</port>
+            <port>9234</port>
         </server>
     </raft_configuration>
   </keeper_server>
@@ -122,11 +122,11 @@ In this 2 shard 1 replica example, the replica macro is `replica_1` on both chno
 ### Replication and sharding configuration
 
 Starting from the top:
-- The remote_servers section of the XML specifies each of the clusters in the environment. The attribute `replace=true` replaces the sample remote_servers in the default ClickHouse configuration with the remote_server configuration specified in this file.  Without this attribute the remote servers in this file would be appended to the list of samples in the default.  
+- The `remote_servers` section of the XML specifies each of the clusters in the environment. The attribute `replace=true` replaces the sample `remote_servers` in the default ClickHouse configuration with the `remote_servers` configuration specified in this file.  Without this attribute, the remote servers in this file would be appended to the list of samples in the default.  
 - In this example, there is one cluster named `cluster_2S_1R`.
 - A secret is created for the cluster named `cluster_2S_1R` with the value `mysecretphrase`.  The secret is shared across all of the remote servers in the environment to ensure that the correct servers are joined together.
 - The cluster `cluster_2S_1R` has two shards, and each of those shards has one replica.  Take a look at the architecture diagram toward the beginning of this document, and compare it with the two `shard` definitions in the XML below.  In each of the shard definitions there is one replica.  The replica is for that specific shard.  The host and port for that replica is specified.  The replica for the first shard in the configuration is stored on `chnode1`, and the replica for the second shard in the configuration is stored on `chnode2`.
-- Internal replication for the shards is set to true.  Each shard can have the internal_replication parameter defined in the config file. If this parameter is set to true, the write operation selects the first healthy replica and writes data to it.
+- Internal replication for the shards is set to true.  Each shard can have the `internal_replication` parameter defined in the config file. If this parameter is set to true, the write operation selects the first healthy replica and writes data to it.
 
 ```xml title="remote-servers.xml on chnode1"
 <clickhouse>
@@ -177,7 +177,7 @@ Up above a few files ClickHouse Keeper was configured.  This configuration file
 
 ## chnode2 configuration
 
-As the configuration is very similar on chnode1 and chnode2 only the differences will be pointed out here.
+As the configuration is very similar on chnode1 and chnode2, only the differences will be pointed out here.
 
 ### Network and logging configuration
 
@@ -221,19 +221,19 @@ This file contains one of the two differences between chnode1 and chnode2.  In t
         <server>
             <id>1</id>
             <hostname>chnode1</hostname>
-            <port>9444</port>
+            <port>9234</port>
         </server>
         # highlight-start
         <server>
             <id>2</id>
             <hostname>chnode2</hostname>
-            <port>9444</port>
+            <port>9234</port>
         </server>
         # highlight-end
         <server>
             <id>3</id>
             <hostname>chnode3</hostname>
-            <port>9444</port>
+            <port>9234</port>
         </server>
     </raft_configuration>
   </keeper_server>
@@ -303,7 +303,7 @@ The macros configuration has one of the differences between chnode1 and chnode2.
 
 ## chnode3 configuration
 
-As chnode3 is not storing data and is only used for ClickHouse Keeper to provide the third node in the quorum chnode3 has only two configuration files, one to configure the network and logging, and one to configure ClickHouse Keeper.
+As chnode3 is not storing data and is only used for ClickHouse Keeper to provide the third node in the quorum, chnode3 has only two configuration files, one to configure the network and logging, and one to configure ClickHouse Keeper.
 
 ### Network and logging configuration
 
@@ -345,18 +345,18 @@ As chnode3 is not storing data and is only used for ClickHouse Keeper to provide
         <server>
             <id>1</id>
             <hostname>chnode1</hostname>
-            <port>9444</port>
+            <port>9234</port>
         </server>
         <server>
             <id>2</id>
             <hostname>chnode2</hostname>
-            <port>9444</port>
+            <port>9234</port>
         </server>
         # highlight-start
         <server>
             <id>3</id>
             <hostname>chnode3</hostname>
-            <port>9444</port>
+            <port>9234</port>
         </server>
         # highlight-end
     </raft_configuration>
@@ -387,7 +387,7 @@ CREATE DATABASE db1 ON CLUSTER cluster_2S_1R
 └─────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
 ```
 
-2. Create a table with MergeTree table engine on the cluster.
+3. Create a table with MergeTree table engine on the cluster.
 :::note
 We do not need not to specify parameters on the table engine since these will be automatically defined based on our macros
 :::
@@ -408,18 +408,18 @@ ORDER BY id
 └─────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
 ```
 
-3. Connect to `chnode1` and insert a row
+4. Connect to `chnode1` and insert a row
 ```sql
 INSERT INTO db1.table1 (id, column1) VALUES (1, 'abc');
 ```
 
-4. Connect to `chnode2` and insert a row
+5. Connect to `chnode2` and insert a row
 
 ```sql
 INSERT INTO db1.table1 (id, column1) VALUES (2, 'def');
 ```
 
-5. Connect to either node, `chnode1` or `chnode2` and you will see only the row that was inserted into that table on that node.
+6. Connect to either node, `chnode1` or `chnode2` and you will see only the row that was inserted into that table on that node.
 for example, on `chnode2`
 ```sql
 SELECT * FROM db1.table1;
@@ -431,7 +431,7 @@ SELECT * FROM db1.table1;
 ```
 
 
-6. Create a distributed table to query both shards on both nodes.
+7. Create a distributed table to query both shards on both nodes.
 (In this exmple, the `rand()` function is set as the sharding key so that it randomly distributes each insert)
 ```sql
 CREATE TABLE db1.table1_dist ON CLUSTER cluster_2S_1R
@@ -448,7 +448,7 @@ ENGINE = Distributed('cluster_2S_1R', 'db1', 'table1', rand())
 └─────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
 ```
 
-7. Connect to either `chnode1` or `chnode2` and query the distributed table to see both rows.
+8. Connect to either `chnode1` or `chnode2` and query the distributed table to see both rows.
 ```
 SELECT * FROM db1.table1_dist;
 ```

diff --git a/docs/en/deployment-guides/terminology.md b/docs/en/deployment-guides/terminology.md
@@ -6,7 +6,7 @@ sidebar_position: 1
 ---
 import ReplicationShardingTerminology from '@site/docs/en/_snippets/_replication-sharding-terminology.md';
 
-These deployment examples are based on the advice provided to ClickHouse users by the ClickHouse Support and Services organization.  These are working examples, and we recommend that you try them and then adjust them to suit your needs.  You may find an example here that fits your requirements exactly. Alternatively, you may have a requirement that your data is replicated three times instead of two, you should be able to add another replica by following the patterns presented here.
+These deployment examples are based on the advice provided to ClickHouse users by the ClickHouse Support and Services organization.  These are working examples, and we recommend that you try them and then adjust them to suit your needs.  You may find an example here that fits your requirements exactly. Alternatively, should you have a requirement where data is replicated three times instead of two, you should be able to add another replica by following the patterns presented here.
 
 <ReplicationShardingTerminology />