Skip to content

Commit

Permalink
Merge branch 'main' into keeper-reconfig
Browse files Browse the repository at this point in the history
  • Loading branch information
DanRoscigno authored Jul 12, 2023
2 parents c95e320 + 58202a7 commit 83a52f4
Show file tree
Hide file tree
Showing 47 changed files with 709 additions and 142 deletions.
3 changes: 0 additions & 3 deletions .vscode/settings.json

This file was deleted.

12 changes: 11 additions & 1 deletion contrib-writing-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,16 @@ At the moment there’s no easy way to do just that, but you can consider:
- To hit the “Watch” button on top of GitHub web interface to know as early as possible, even during pull request. Alternative to this is `#github-activity` channel of [public ClickHouse Slack](https://clickhouse.com/slack).
- Some search engines allow to subscribe on specific website changes via email and you can opt-in for that for https://clickhouse.com.

## Algolia

The docs are crawled daily. The configuration for the crawler is in the docs-private repo
as the crawler config contains a key that is used to manage the Algolia account. If you need to modify the crawler configuration log in to crawler.algolia.com and edit the configuration in the
UI. Once the updated configuration is tested, update the configuration stored in the docs-private repo.

**Note**

Comments added to the config get removed by the Algolia editor :( The best practice would be to add your comments to the PR used to update the config in docs-private.

### Doc search tweaks
We use [Docsearch](https://docsearch.algolia.com/) from Algolia; there is not much for you to do to have the docs you write added to the search. Every Monday, the Algolia crawler updates our index.

Expand Down Expand Up @@ -497,7 +507,7 @@ sidebar_label: FUNCTION
Creates a user defined function from a lambda expression.
```

Note: The docs are crawled each Monday morning. If you make a change and want the docs re-crawled open an issue in clickhouse-docs.
Note: The docs are crawled each morning. If you make a change and want the docs re-crawled sooner, open an issue in clickhouse-docs.

## Tools that you might like

Expand Down
6 changes: 5 additions & 1 deletion docs/en/about-us/adopters.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ The following list of companies using ClickHouse and their success stories is as
| [Bitquery](https://bitquery.io/) | Software & Technology | Blockchain Data Company ||| [HackerNews, December 2020](https://bitquery.io/blog/blockchain-intelligence-system) |
| [Bloomberg](https://www.bloomberg.com/) | Finance, Media | Monitoring ||| [Meetup Video, December 2022](https://www.youtube.com/watch?v=HmJTIrGyVls&list=PL0Z2YDlm0b3iNDUzpY1S3L_iV4nARda_U&index=9) [Slides, December 2022](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup67/ClickHouse%20for%20Financial%20Analytics%20-%20Bloomberg.pdf) |
| [Bloxy](https://bloxy.info) | Blockchain | Analytics ||| [Slides in Russian, August 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup17/4_bloxy.pptx) |
| [Bonside](https://www.bonside.com/) | FinTech | - ||| [HackerNews, July 2023](https://news.ycombinator.com/item?id=36619722) |
| [Botify](https://www.botify.com/) | SaaS | SEO ||| [Blog Article, September 2022](https://tech.marksblogg.com/billion-taxi-rides-doublecloud-clickhouse.html) |
| [Bytedance](https://www.bytedance.com) | Social platforms |||| [The ClickHouse Meetup East, October 2020](https://www.youtube.com/watch?v=ckChUkC3Pns) |
| [Campaign Deputy](https://campaigndeputy.com/) | SaaS | Analytics, Logs ||| [Tweet, February 2023](https://twitter.com/joshabartley/status/1627669208074014721) |
Expand Down Expand Up @@ -131,6 +132,8 @@ The following list of companies using ClickHouse and their success stories is as
| [highlight](https://www.highlight.io/) | Software & Technology | Monitoring ||| [Hacker News, February 2023](https://news.ycombinator.com/item?id=34897645), [GitHub](https://github.com/highlight/highlight/tree/87f7e3882b88e9019d690847a134231e943890fe/backend/clickhouse) |
| [HockeyStack](https://hockeystack.com/) | Analytics platform | OLAP ||| [Blog](https://hockeystack.com/blog/a-new-database/) |
| [hookdeck](https://hookdeck.com/) | Software & Technology | Webhook ||| [Twitter, June 2023](https://twitter.com/mkherlakian/status/1666214460824997889) |
| [Hopsteiner](https://www.hopsteiner.com/) | Agriculture | - ||| [Job post, July 2023](https://www.indeed.com/viewjob?t=Systems+Administrator&c=S+S+STEINER&l=Yakima,+WA&jk=5b9b7336de0577d5&rtk=1h45ruu32j30q800&from=rss) |
| [Hubalz](https://hubalz.com) | Web analytics | Main product ||| [Twitter, July 2023](https://twitter.com/Derinilkcan/status/1676197439152312321) |
| [HUYA](https://www.huya.com/) | Video Streaming | Analytics ||| [Slides in Chinese, October 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup19/7.%20ClickHouse万亿数据分析实践%20李本旺(sundy-li)%20虎牙.pdf) |
| [Hydrolix](https://www.hydrolix.io/) | Cloud data platform | Main product ||| [Documentation](https://docs.hydrolix.io/guide/query) |
| [HyperDx](https://www.hyperdx.io/) | Software & Technology | Open Telemetry ||| [HackerNews, May 2023](https://news.ycombinator.com/item?id=35881942) |
Expand Down Expand Up @@ -191,6 +194,7 @@ The following list of companies using ClickHouse and their success stories is as
| [MyScale](https://myscale.com/) | Software & Technology | AI Database ||| [Docs](https://docs.myscale.com/en/overview/) |
| [NANO Corp](https://nanocorp.fr/en/) | Software & Technology | NOC as a Service ||| [Blog Post, July 2022](https://clickhouse.com/blog/from-experimentation-to-production-the-journey-to-supercolumn) |
| [Nationale Databank Wegverkeers](https://www.ndw.nu/) | Software & Technology | Road Traffic Monitoring ||| [Presentation at Foss4G, August 2019](https://av.tib.eu/media/43434) |
| [Nebius](https://nebius.com/il/docs/managed-clickhouse/) | SaaS | Main product ||| [Official website](https://nebius.com/il/docs/managed-clickhouse/) |
| [NetMeta](https://github.com/monogon-dev/NetMeta/blob/main/README.md) | Observability | Main Product ||| [Tweet, December 2022](https://twitter.com/leolukde/status/1605643470239977475) |
| [Netskope](https://www.netskope.com/) | Network Security |||| [Job advertisement, March 2021](https://www.mendeley.com/careers/job/senior-software-developer-backend-developer-1346348) |
| [Nexpath Networks](https://www.nexpath.net/) | Software & Technology | Network Analysis ||| [Slides, September 2021](https://opensips.org/events/Summit-2021Distributed/assets/presentations/2021-jon-abrams-big-telco-data-with-clickhouse.pdf) [Video, September 2021](https://www.youtube.com/watch?v=kyu_wDcO0S4&t=3840s) |
Expand Down Expand Up @@ -339,5 +343,5 @@ The following list of companies using ClickHouse and their success stories is as
| [ДомКлик](https://domclick.ru/) | Real Estate |||| [Article in Russian, October 2021](https://habr.com/ru/company/domclick/blog/585936/) |
| [АС "Стрела"](https://magenta-technology.ru/sistema-upravleniya-marshrutami-inkassacii-as-strela/) | Transportation |||| [Job posting, Jan 2022](https://vk.com/topic-111905078_35689124?post=3553) |
| [Deepglint 格灵深瞳](https://www.deepglint.com/) | AI, Computer Vision | OLAP ||| [Official Website](https://www.deepglint.com/) |

| [Kujiale 酷家乐](https://www.kujiale.com/) | VR smart interior design platform. | Use in log monitoring platform. | Main cluster is 800+ CPU cores, 4000+ GB RAM. | SSD 140+ TB, HDD 280+ TB. | [Blog, July 2023](https://juejin.cn/post/7251786922615111740/) |
</div>
2 changes: 1 addition & 1 deletion docs/en/cloud/manage/users-and-roles.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ When you add additional SQL users for your ClickHouse Cloud service, they will n

```sql
CREATE USER IF NOT EXISTS clickhouse_admin
IDENTIFIED WITH sha256_password BY 'password';
IDENTIFIED WITH sha256_password BY 'P!@ssword42!';
```

```sql
Expand Down
2 changes: 1 addition & 1 deletion docs/en/cloud/security/aws-privatelink.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ You can use [AWS PrivateLink](https://aws.amazon.com/privatelink/) to provide co
This table lists the AWS Regions where ClickHouse Cloud services can be deployed, the associated VPC service name, and Availability Zone IDs. You will need this information to setup AWS PrivateLink to connect to ClickHouse Cloud services.
<AWSRegions/>

If you require two or more AWS Private Links within the same ASWS region, then please note: In ClickHouse, we have a VPC Endpoint service at a regional level. When you setup two or more VPC Endpoints in the same VPC - from the AWS VPC perspective - you are utilizing just a single AWS Private Link. In such a situation where you need two or more AWS Private Links configured within the same region, please just create just one VPC Endpoint in your VPC, and request that ClickHouse configure the same VPC Endpoint ID for all of your ClickHouse services in the same AWS region.
If you require two or more AWS Private Links within the same AWS region, then please note: In ClickHouse, we have a VPC Endpoint service at a regional level. When you setup two or more VPC Endpoints in the same VPC - from the AWS VPC perspective - you are utilizing just a single AWS Private Link. In such a situation where you need two or more AWS Private Links configured within the same region, please just create just one VPC Endpoint in your VPC, and request that ClickHouse configure the same VPC Endpoint ID for all of your ClickHouse services in the same AWS region.

:::note
AWS PrivateLink can be enabled only on ClickHouse Cloud Production services
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
187 changes: 187 additions & 0 deletions docs/en/integrations/data-ingestion/clickpipes/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
---
sidebar_label: ClickPipes (New)
description: Seamlessly connect your external data sources to ClickHouse Cloud.
slug: /en/integrations/clickpipes
---

import KafkaSVG from "../../images/logos/kafka.svg";
import ConfluentSVG from "../../images/logos/confluent.svg";

# Integrating Kafka with ClickHouse Cloud

## Introduction

[ClickPipes](https://clickhouse.com/cloud/clickpipes) (currently in Beta) is a managed integration platform that makes ingesting data from a diverse set of sources as simple as clicking a few buttons. Designed for the most demanding workloads, ClickPipes's robust and scalable architecture ensures consistent performance and reliability.

![ClickPipes stack](./images/clickpipes_stack.png)

:::note
ClickPipes is a native capability of [ClickHouse Cloud](https://clickhouse.com/cloud) currently under private preview. You can join [our waitlist here](https://clickhouse.com/cloud/clickpipes#joinwaitlist)
:::

## Setup

### 1. Enable ClickPipes for your cloud organization

ClickPipes is currently accessible in private preview. You can join our waitlist by filling [this form](https://clickhouse.com/cloud/clickpipes#joinwaitlist). Please note that during the Private Preview phase, ClickPipes is available only for Amazon Web Services backed services, in the `us-east-2` region.

### 2. Creating your first ClickPipe

1. Access the SQL Console for your ClickHouse Cloud Service running in AWS `us-east-2` region.

![ClickPipes service](./images/cp_service.png)

2. Select the `Imports` button on the left-side menu and click on "Ingest Data From Kafka"

![Select imports](./images/cp_step0.png)

3. Select your data source, either "Confluent Cloud" or "Apache Kafka"

![Select data source type](./images/cp_step1.png)

4. Fill out the form by providing your ClickPipe with a name, a description (optional), your credentials, a consumer group as well as the Kafka broker URL.

![Fill out connection details](./images/cp_step2.png)

:::note
Support for Confluent Cloud Schema Registry is coming soon
:::

5. Select your data format (we currently support `JSON`), and your Kafka topic. The UI will display a sample document from the selected Kafka topic.

![Set data format and topic](./images/cp_step3.png)

5. In the next step, you can select whether you want to ingest data into a new ClickHouse table or reuse an existing one. Follow the instructions in the screen to modify your table name, schema, and settings. You can see a real-time preview of your changes in the sample table at the top.

![Set table, schema, and settings](./images/cp_step4a.png)

You can also customize the advanced settings using the controls provided

![Set advanced controls](./images/cp_step4a3.png)

6. Alternatively, you can decide to ingest your data in an existing ClickHouse table. In that case, the UI will allow you to map fields from Kafka with the ClickHouse fields in the selected destination table.

![Use and existing table](./images/cp_step4b.png)

7. Finally, you can decide to enable the error logging table. When enabled, ClickPipes will create a table next to your destination table with the postfix `_clickpipes_error`. This table will contain any errors from the operations of your ClickPipe (network, connectivity, etc.) and also any data that don't conform to the schema specified in the previous screen. The error table has a [TTL](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-ttl) of 7 days.

![enable error logging table](./images/cp_step5.png)

8. By clicking on "Complete Setup", the system will register you ClickPipe, and you'll be able to see it listed in the summary table.

![Success notice](./images/cp_success.png)

![Remove notice](./images/cp_remove.png)

The summary table provides controls to display sample data from the Kafka broker or the destination table in ClickHouse

![View source](./images/cp_source.png)

![View destination](./images/cp_destination.png)

As well as controls to remove the ClickPipe and display a summary of the ingest job.

![View overview](./images/cp_overview.png)

9. **Congratulations!** you have successfully set up your first ClickPipe. This job will be continuously running, ingesting data in real-time from your remote data source.

## Supported Data Sources

|Name|Logo|Type|Description|
|------|----|----------------|------------------|
|Confluent Cloud|<ConfluentSVG style={{width: '3rem'}} />|Streaming|Unlock the combined power of Confluent and ClickHouse Cloud through our direct integration.|
|Apache Kafka|<KafkaSVG style={{width: '3rem', 'height': '3rem'}} />|Streaming|Configure ClickPipes and start ingesting streaming data from Apache Kafka into ClickHouse Cloud.|

More connectors are will get added to ClickPipes, you can find out more by [contacting us](https://clickhouse.com/company/contact?loc=clickpipes).

## Supported data formats

The supported formats are:

| Format | Support |
|-------------------------------------------------------------------------------------------|-------------|
| [JSON](../../../interfaces/formats.md/#json) ||
| [AvroConfluent](../../../interfaces/formats.md/#data-format-avro-confluent) |*Coming Soon*|
| [TabSeparated](../../../interfaces/formats.md/#tabseparated) |*Coming Soon*|
| [CSV](../../../interfaces/formats.md/#csv) |*Coming Soon*|

## Supported data types

The following ClickHouse types are currently supported by the transform package (with standard JSON as the source):

- Base numeric types
- Int8
- Int16
- Int32
- Int64
- UInt8
- UInt16
- UInt32
- UInt64
- Float32
- Float64
- Boolean
- String
- Date
- DateTime
- DateTime64
- Enum8/Enum16
- LowCardinality(String)
- Map with keys and values using any of the above types (including Nullables)
- Array with elements using any of the above types (including Nullables, one level depth only)

:::note
Nullable versions of the above are also supported with these exceptions:

- Nullable Enums are **not** supported
- LowCardinality(Nullable(String)) are **not** supported

:::

## Current Limitations

- During the Private Preview phase, ClickPipes is available only on the services backed by Amazon Web Services, in the `us-east-2` region.
- Private Link support isn't currently available for ClickPipes but will be released in the near future.
- Once ClickPipes is enabled for your cloud organization, you need to start a new ClickHouse service in order to access it via the SQL Console.

## F.A.Q

- **What is ClickPipes ?**

ClickPipes is a ClickHouse Cloud feature that makes it easy for users to connect their ClickHouse services to external data sources, specifically Kafka. With ClickPipes for Kafka, users can easily continuously load data into ClickHouse, making it available for real-time analytics.

- **What types of data sources does ClickPipes support ?**

Currently, ClickPipes supports Confluent Cloud and Apache Kafka as data sources. However, we are committed to expand our support for more data sources in the future. Don't hesitate to [contact us](https://clickhouse.com/company/contact?loc=clickpipes) if you want to know more.

- **How does ClickPipes for Kafka work ?**

ClickPipes uses a dedicated architecture running the Kafka Consumer API to read data from a specified topic and then inserts the data into a ClickHouse table on a specific ClickHouse Cloud service.

- **What are the requirements for using ClickPipes for Kafka ?**

In order to use ClickPipes for Kafka, you will need a running Kafka broker and a ClickHouse Cloud service with ClickPipes enabled. You will also need to ensure that ClickHouse Cloud can access your Kafka broker. This can be achieved by allowing remote connection on the Kafka side, whitelisting [ClickHouse Cloud Egress IP addresses](https://clickhouse.com/docs/en/manage/security/cloud-endpoints-api) in your Kafka setup. Support for AWS Private Link is coming soon.

- **Can I use ClickPipes for Kafka to write data to a Kafka topic ?**

No, the ClickPipes for Kafka is designed for reading data from Kafka topics, not writing data to them. To write data to a Kafka topic, you will need to use a dedicated Kafka producer.

- **What data formats are supported by ClickPipes for Kafka ?**

The list of supported data types is [displayed above](#supported-data-types).

- **Does ClickPipes support data transformation ?**

Yes, ClickPipes supports basic data transformation by exposing the DDL creation. You can then apply more advanceD transformations to the data as it is loaded into its destination table in a ClickHouse Cloud service leveraging ClickHouse's [materialized views feature](https://clickhouse.com/docs/en/guides/developer/cascading-materialized-views).

- **What delivery semantics ClickPipes for Kafka supports ?**

ClickPipes for Kafka provides `at-least-once` delivery semantics (as one of the most commonly used approaches). We'd love to hear your feedback on delivery semantics (contact form). If you need exactly-once semantics, we recommend using our official [`clickhouse-kafka-connect`](https://clickhouse.com/blog/real-time-event-streaming-with-kafka-connect-confluent-cloud-clickhouse) sink.

- **Is there a way to handle errors or failures when using ClickPipes for Kafka ?**

Yes, ClickPipes for Kafka will automatically retry case of failures when consuming data from Kafka. ClickPipes also supports enabling a dedicated error table that will hold errors and malformed data for 7 days.

- **Does using ClickPipes incur an additional cost ?**

ClickPipes is not billed separately. Running ClickPipes might generate an indirect compute and storage cost on the destination ClickHouse Cloud service like any ingest workload.
Loading

0 comments on commit 83a52f4

Please sign in to comment.