Skip to content

Commit

Permalink
Merge pull request ClickHouse#100 from ClickHouse/fix-scripts
Browse files Browse the repository at this point in the history
Fix references to kafka code
  • Loading branch information
rfraposa authored May 12, 2022
2 parents adce620 + 0aed264 commit 6ea56db
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
2 changes: 1 addition & 1 deletion docs/en/integrations/kafka/kafka-connect-http.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ The following additional parameters are relevant to using the HTTP Sink with Cli

A full list of settings, including how to configure a proxy, retries, and advanced SSL, can be found [here](https://docs.confluent.io/kafka-connect-http/current/connector_config.html).

Example configuration files for the Github sample data can be found [here](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/integrations/kafka/code/connectors/http_sink), assuming Connect is run in standalone mode and Kafka is hosted in Confluent Cloud.
Example configuration files for the Github sample data can be found [here](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/en/integrations/kafka/code/connectors/http_sink), assuming Connect is run in standalone mode and Kafka is hosted in Confluent Cloud.

#### 3. Create the ClickHouse table

Expand Down
10 changes: 5 additions & 5 deletions docs/en/integrations/kafka/kafka-connect-jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ description: Using JDBC Connector Sink with Kafka Connect and ClickHouse

For our examples, we utilize the Confluent distribution of Kafka Connect.

Below we describe a simple installation, pulling messages from a single Kafka topic and inserting rows into a ClickHouse table. We recommend Confluent Cloud, which offers a generous free tier for those who do not have a Kafka environment. Either adapt the following examples to your own dataset or utilize the [sample GitHub](https://datasets-documentation.s3.eu-west-3.amazonaws.com/kafka/github_all_columns.ndjson) dataset with the accompanying [insertion script](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/integrations/kafka/code/producer).
Below we describe a simple installation, pulling messages from a single Kafka topic and inserting rows into a ClickHouse table. We recommend Confluent Cloud, which offers a generous free tier for those who do not have a Kafka environment. Either adapt the following examples to your own dataset or utilize the [sample GitHub](https://datasets-documentation.s3.eu-west-3.amazonaws.com/kafka/github_all_columns.ndjson) dataset with the accompanying [insertion script](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/en/integrations/kafka/code/producer).

Note that a schema is required for the JDBC Connector (You cannot use plain JSON or CSV with the JDBC connector). Whilst the schema can be encoded in each message; it is [strongly advised to use the Confluent schema registr](https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained/#json-schemas)y to avoid the associated overhead. The insertion script provided automatically infers a schema from the messages and inserts this to the registry - this script can thus be reused for other datasets. Kafka's keys are assumed to be Strings. Further details on Kafka schemas can be found [here](https://docs.confluent.io/platform/current/schema-registry/index.html).

Expand Down Expand Up @@ -65,7 +65,7 @@ If using our sample dataset for testing, ensure the following are set:
* `value.converter` - Set “io.confluent.connect.json.JsonSchemaConverter”.
* `value.converter.schema.registry.url` - Set to the schema server url along with the credentials for the schema server via the parameter `value.converter.schema.registry.basic.auth.user.info`.

Example configuration files for the Github sample data can be found [here](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/integrations/kafka/code/connectors/jdbc_sink), assuming Connect is run in standalone mode and Kafka is hosted in Confluent Cloud.
Example configuration files for the Github sample data can be found [here](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/en/integrations/kafka/code/connectors/jdbc_sink), assuming Connect is run in standalone mode and Kafka is hosted in Confluent Cloud.


### 4. Create the ClickHouse table
Expand Down Expand Up @@ -103,7 +103,7 @@ CREATE TABLE github
### 5. Start Kafka Connect


Start Kafka Connect in either [standalone](https://docs.confluent.io/cloud/current/cp-component/connect-cloud-config.html#standalone-cluster) or [distributed](https://docs.confluent.io/cloud/current/cp-component/connect-cloud-config.html#distributed-cluster) mode. For standalone mode, using the [sample configurations](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/integrations/kafka/code/connectors), this is as simple as:
Start Kafka Connect in either [standalone](https://docs.confluent.io/cloud/current/cp-component/connect-cloud-config.html#standalone-cluster) or [distributed](https://docs.confluent.io/cloud/current/cp-component/connect-cloud-config.html#distributed-cluster) mode. For standalone mode, using the [sample configurations](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/en/integrations/kafka/code/connectors), this is as simple as:

```bash
./bin/connect-standalone connect.properties.ini github-jdbc-sink.properties.ini
Expand All @@ -112,13 +112,13 @@ Start Kafka Connect in either [standalone](https://docs.confluent.io/cloud/curre
### 6. Add data to Kafka


Insert messages to Kafka using the [script and config](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/integrations/kafka/code/producer) provided. You will need to modify github.config to include your Kafka credentials. The script is currently configured for use with Confluent Cloud.
Insert messages to Kafka using the [script and config](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/en/integrations/kafka/code/producer) provided. You will need to modify github.config to include your Kafka credentials. The script is currently configured for use with Confluent Cloud.

```bash
python producer.py -c github.config
```

This script can be used to insert any ndjson file into a Kafka topic. This will attempt to infer a schema for you automatically. The sample config provided will only insert 10k messages - modify [here](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/integrations/kafka/code/producer/github.config#L25) if required. This configuration also removes any incompatible Array fields from the dataset during insertion to Kafka.
This script can be used to insert any ndjson file into a Kafka topic. This will attempt to infer a schema for you automatically. The sample config provided will only insert 10k messages - modify [here](https://github.com/ClickHouse/clickhouse-docs/tree/main/docs/en/integrations/kafka/code/producer/github.config#L25) if required. This configuration also removes any incompatible Array fields from the dataset during insertion to Kafka.

This is required for the JDBC connector to convert messages to INSERT statements. If you are using your own data, ensure you either insert a schema with every message (setting _value.converter.schemas.enable _to true) or ensure your client publishes messages referencing a schema to the registry.

Expand Down

0 comments on commit 6ea56db

Please sign in to comment.