Skip to content

Commit

Permalink
Merge pull request cockroachdb#7429 from cockroachdb/misc-cdc-edits
Browse files Browse the repository at this point in the history
Misc. CDC edits
  • Loading branch information
Lauren Hirata Singh authored Jun 16, 2020
2 parents 2d6978b + 033b6db commit c8f06de
Show file tree
Hide file tree
Showing 17 changed files with 80 additions and 42 deletions.
1 change: 0 additions & 1 deletion _includes/v19.1/cdc/core-url.md

This file was deleted.

2 changes: 0 additions & 2 deletions _includes/v19.1/cdc/create-core-changefeed-avro.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@ In this example, you'll set up a core changefeed for a single-node cluster that
$ cockroach sql --url="postgresql://[email protected]:26257?sslmode=disable" --format=csv
~~~

{% include {{ page.version.version }}/cdc/core-url.md %}

{% include {{ page.version.version }}/cdc/core-csv.md %}

5. Enable the `kv.rangefeed.enabled` [cluster setting](cluster-settings.html):
Expand Down
2 changes: 0 additions & 2 deletions _includes/v19.1/cdc/create-core-changefeed.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@ In this example, you'll set up a core changefeed for a single-node cluster.
--format=csv
~~~

{% include {{ page.version.version }}/cdc/core-url.md %}

{% include {{ page.version.version }}/cdc/core-csv.md %}

3. Enable the `kv.rangefeed.enabled` [cluster setting](cluster-settings.html):
Expand Down
3 changes: 0 additions & 3 deletions _includes/v19.2/cdc/core-url.md

This file was deleted.

2 changes: 0 additions & 2 deletions _includes/v19.2/cdc/create-core-changefeed-avro.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@ In this example, you'll set up a core changefeed for a single-node cluster that
$ cockroach sql --url="postgresql://[email protected]:26257?sslmode=disable" --format=csv
~~~

{% include {{ page.version.version }}/cdc/core-url.md %}

{% include {{ page.version.version }}/cdc/core-csv.md %}

5. Enable the `kv.rangefeed.enabled` [cluster setting](cluster-settings.html):
Expand Down
2 changes: 0 additions & 2 deletions _includes/v19.2/cdc/create-core-changefeed.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@ In this example, you'll set up a core changefeed for a single-node cluster.
--format=csv
~~~

{% include {{ page.version.version }}/cdc/core-url.md %}

{% include {{ page.version.version }}/cdc/core-csv.md %}

3. Enable the `kv.rangefeed.enabled` [cluster setting](cluster-settings.html):
Expand Down
3 changes: 0 additions & 3 deletions _includes/v20.1/cdc/core-url.md

This file was deleted.

13 changes: 9 additions & 4 deletions _includes/v20.1/cdc/create-core-changefeed-avro.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ In this example, you'll set up a core changefeed for a single-node cluster that

{% include copy-clipboard.html %}
~~~ shell
$ cockroach sql --url="postgresql://[email protected]:26257?sslmode=disable" --format=csv
$ cockroach sql \
--format=csv \
--insecure
~~~

{% include {{ page.version.version }}/cdc/core-url.md %}

{% include {{ page.version.version }}/cdc/core-csv.md %}

5. Enable the `kv.rangefeed.enabled` [cluster setting](cluster-settings.html):
Expand Down Expand Up @@ -57,14 +57,18 @@ In this example, you'll set up a core changefeed for a single-node cluster that

{% include copy-clipboard.html %}
~~~ sql
> EXPERIMENTAL CHANGEFEED FOR bar WITH format = experimental_avro, confluent_schema_registry = 'http://localhost:8081';
> EXPERIMENTAL CHANGEFEED FOR bar \
WITH format = experimental_avro, confluent_schema_registry = 'http://localhost:8081', resolved = '10s';
~~~

~~~
table,key,value
bar,\000\000\000\000\001\002\000,\000\000\000\000\002\002\002\000
NULL,NULL,\000\000\000\000\003\002<1590612821682559000.0000000000
~~~

This changefeed will emit [`resolved` timestamps](changefeed-for.html#options) every 10 seconds. Depending on how quickly you insert into your watched table, the output could look different than what is shown here.

9. In a new terminal, add another row:

{% include copy-clipboard.html %}
Expand All @@ -76,6 +80,7 @@ In this example, you'll set up a core changefeed for a single-node cluster that

~~~
bar,\000\000\000\000\001\002\002,\000\000\000\000\002\002\002\002
NULL,NULL,\000\000\000\000\003\002<1590612831891317000.0000000000
~~~

Note that records may take a couple of seconds to display in the core changefeed.
Expand Down
20 changes: 13 additions & 7 deletions _includes/v20.1/cdc/create-core-changefeed.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
In this example, you'll set up a core changefeed for a single-node cluster.

1. In a terminal window, start `cockroach`:
1. Use the [`cockroach start-single-node`](cockroach-start-single-node.html) command to start a single-node cluster:

{% include copy-clipboard.html %}
~~~ shell
$ cockroach start \
$ cockroach start-single-node \
--insecure \
--listen-addr=localhost \
--background
Expand All @@ -15,12 +15,10 @@ In this example, you'll set up a core changefeed for a single-node cluster.
{% include copy-clipboard.html %}
~~~ shell
$ cockroach sql \
--url="postgresql://[email protected]:26257?sslmode=disable" \
--format=csv
--format=csv \
--insecure
~~~

{% include {{ page.version.version }}/cdc/core-url.md %}

{% include {{ page.version.version }}/cdc/core-csv.md %}

3. Enable the `kv.rangefeed.enabled` [cluster setting](cluster-settings.html):
Expand Down Expand Up @@ -48,13 +46,17 @@ In this example, you'll set up a core changefeed for a single-node cluster.

{% include copy-clipboard.html %}
~~~ sql
> EXPERIMENTAL CHANGEFEED FOR foo;
> EXPERIMENTAL CHANGEFEED FOR foo
WITH resolved = '10s';
~~~
~~~
table,key,value
foo,[0],"{""after"": {""a"": 0}}"
NULL,NULL,"{""resolved"":""1590611959605806000.0000000000""}"
~~~

This changefeed will emit [`resolved` timestamps](changefeed-for.html#options) every 10 seconds. Depending on how quickly you insert into your watched table, the output could look different than what is shown here.

7. In a new terminal, add another row:

{% include copy-clipboard.html %}
Expand All @@ -65,7 +67,11 @@ In this example, you'll set up a core changefeed for a single-node cluster.
8. Back in the terminal where the core changefeed is streaming, the following output has appeared:

~~~
table,key,value
foo,[0],"{""after"": {""a"": 0}}"
NULL,NULL,"{""resolved"":""1590611959605806000.0000000000""}"
foo,[1],"{""after"": {""a"": 1}}"
NULL,NULL,"{""resolved"":""1590611970141415000.0000000000""}"
~~~

Note that records may take a couple of seconds to display in the core changefeed.
Expand Down
8 changes: 6 additions & 2 deletions v19.1/changefeed-for.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ toc: true

<span class="version-tag">New in v19.1:</span> The `EXPERIMENTAL CHANGEFEED FOR` [statement](sql-statements.html) creates a new core changefeed, which streams row-level changes to the client indefinitely until the underlying connection is closed or the changefeed is canceled.

{% include {{ page.version.version }}/cdc/core-url.md %}

For more information, see [Change Data Capture](change-data-capture.html).

{% include {{ page.version.version }}/misc/experimental-warning.md %}
Expand All @@ -20,6 +18,12 @@ For more information, see [Change Data Capture](change-data-capture.html).

Changefeeds can only be created by superusers, i.e., [members of the `admin` role](create-and-manage-users.html). The admin role exists by default with `root` as the member.

## Considerations

Because core changefeeds return results differently than other SQL statements, they require a dedicated database connection with specific settings around result buffering. In normal operation, CockroachDB improves performance by buffering results server-side before returning them to a client; however, result buffering is automatically turned off for core changefeeds. Core changefeeds also have different cancelation behavior than other queries: they can only be canceled by closing the underlying connection or issuing a [`CANCEL QUERY`](cancel-query.html) statement on a separate connection. Combined, these attributes of changefeeds mean that applications should explicitly create dedicated connections to consume changefeed data, instead of using a connection pool as most client drivers do by default.

This cancelation behavior also extends to client driver usage; in particular, when a client driver calls `Rows.Close()` after encountering errors for a stream of rows. The pgwire protocol requires that the rows be consumed before the connection is again usable, but in the case of a core changefeed, the rows are never consumed. It is therefore critical that you close the connection, otherwise the application will be blocked forever on `Rows.Close()`.

## Synopsis

~~~
Expand Down
2 changes: 1 addition & 1 deletion v19.2/change-data-capture.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ summary: Change data capture (CDC) provides efficient, distributed, row-level ch
toc: true
---

Change data capture (CDC) provides efficient, distributed, row-level change feeds into a configurable sink for downstream processing such as reporting, caching, or full-text indexing.
Change data capture (CDC) provides efficient, distributed, row-level change feeds into a configurable sink for downstream processing such as reporting, caching, or full-text indexing. Change data capture is used for high-latency data exports from CockroachDB to a data warehouse. It is not a low-latency publish-subscribe mechanism.

## What is change data capture?

Expand Down
9 changes: 7 additions & 2 deletions v19.2/changefeed-for.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ toc: true

The `EXPERIMENTAL CHANGEFEED FOR` [statement](sql-statements.html) creates a new core changefeed, which streams row-level changes to the client indefinitely until the underlying connection is closed or the changefeed is canceled.

{% include {{ page.version.version }}/cdc/core-url.md %}

For more information, see [Change Data Capture](change-data-capture.html).

{% include {{ page.version.version }}/misc/experimental-warning.md %}
Expand All @@ -20,6 +18,13 @@ For more information, see [Change Data Capture](change-data-capture.html).

Changefeeds can only be created by superusers, i.e., [members of the `admin` role](authorization.html#create-and-manage-roles). The admin role exists by default with `root` as the member.

## Considerations

- Because core changefeeds return results differently than other SQL statements, they require a dedicated database connection with specific settings around result buffering. In normal operation, CockroachDB improves performance by buffering results server-side before returning them to a client; however, result buffering is automatically turned off for core changefeeds. Core changefeeds also have different cancelation behavior than other queries: they can only be canceled by closing the underlying connection or issuing a [`CANCEL QUERY`](cancel-query.html) statement on a separate connection. Combined, these attributes of changefeeds mean that applications should explicitly create dedicated connections to consume changefeed data, instead of using a connection pool as most client drivers do by default.

This cancelation behavior (i.e., close the underlying connection to cancel the changefeed) also extends to client driver usage; in particular, when a client driver calls `Rows.Close()` after encountering errors for a stream of rows. The pgwire protocol requires that the rows be consumed before the connection is again usable, but in the case of a core changefeed, the rows are never consumed. It is therefore critical that you close the connection, otherwise the application will be blocked forever on `Rows.Close()`.

- In most cases, each version of a row will be emitted once. However, some infrequent conditions (e.g., node failures, network partitions) will cause them to be repeated. This gives our changefeeds an at-least-once delivery guarantee. For more information, see [Change Data Capture - Ordering Guarantees](change-data-capture.html#ordering-guarantees).
## Synopsis

~~~
Expand Down
4 changes: 4 additions & 0 deletions v19.2/create-changefeed.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ For more information, see [Change Data Capture](change-data-capture.html).

Changefeeds can only be created by superusers, i.e., [members of the `admin` role](authorization.html#create-and-manage-roles). The admin role exists by default with `root` as the member.

## Considerations

- In most cases, each version of a row will be emitted once. However, some infrequent conditions (e.g., node failures, network partitions) will cause them to be repeated. This gives our changefeeds an at-least-once delivery guarantee. For more information, see [Change Data Capture - Ordering Guarantees](change-data-capture.html#ordering-guarantees).

## Synopsis

<div>
Expand Down
29 changes: 20 additions & 9 deletions v20.1/change-data-capture.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ summary: Change data capture (CDC) provides efficient, distributed, row-level ch
toc: true
---

Change data capture (CDC) provides efficient, distributed, row-level change feeds into a configurable sink for downstream processing such as reporting, caching, or full-text indexing.
Change data capture (CDC) provides efficient, distributed, row-level change feeds into a configurable sink for downstream processing such as reporting, caching, or full-text indexing. Change data capture is used for high-latency data exports from CockroachDB to a data warehouse. It is not a low-latency publish-subscribe mechanism.

## What is change data capture?

Expand Down Expand Up @@ -409,7 +409,9 @@ In this example, you'll set up a changefeed for a single-node cluster that is co

{% include copy-clipboard.html %}
~~~ sql
> CREATE CHANGEFEED FOR TABLE office_dogs, employees INTO 'kafka://localhost:9092';
> CREATE CHANGEFEED FOR TABLE office_dogs, employees \
INTO 'kafka://localhost:9092' \
WITH resolved = '10s';
~~~
~~~

Expand All @@ -419,7 +421,7 @@ In this example, you'll set up a changefeed for a single-node cluster that is co
(1 row)
~~~

This will start up the changefeed in the background and return the `job_id`. The changefeed writes to Kafka.
This will start up the changefeed in the background and return the `job_id`. The changefeed writes to Kafka and it will emit [`resolved` timestamps](create-changefeed.html#options) every 10 seconds. Depending on how quickly you insert into your watched tables, the output could look different than what is shown here.

14. In a new terminal, move into the extracted `confluent-<version>` directory and start watching the Kafka topics:

Expand All @@ -436,6 +438,8 @@ In this example, you'll set up a changefeed for a single-node cluster that is co
{"after": {"id": 2, "name": "Carl"}}
{"after": {"id": 1, "name": "Lauren", "rowid": 528514320239329281}}
{"after": {"id": 2, "name": "Spencer", "rowid": 528514320239362049}}
{"resolved":"1590613881923330000.0000000000"}
{"resolved":"1590613881923330000.0000000000"}
~~~

The initial scan displays the state of the tables as of when the changefeed started (therefore, the initial value of `"Petee"` is omitted).
Expand Down Expand Up @@ -591,7 +595,7 @@ In this example, you'll set up a changefeed for a single-node cluster that is co
{% include copy-clipboard.html %}
~~~ sql
> CREATE TABLE employees (
dog_id INT REFERENCES office_dogs_avro (id),
dog_id INT REFERENCES office_dogs (id),
employee_name STRING);
~~~

Expand All @@ -606,7 +610,9 @@ In this example, you'll set up a changefeed for a single-node cluster that is co

{% include copy-clipboard.html %}
~~~ sql
> CREATE CHANGEFEED FOR TABLE office_dogs, employees INTO 'kafka://localhost:9092' WITH format = experimental_avro, confluent_schema_registry = 'http://localhost:8081';
> CREATE CHANGEFEED FOR TABLE office_dogs, employees \
INTO 'kafka://localhost:9092' \
WITH format = experimental_avro, confluent_schema_registry = 'http://localhost:8081', resolved = '10s';
~~~

~~~
Expand All @@ -616,7 +622,7 @@ In this example, you'll set up a changefeed for a single-node cluster that is co
(1 row)
~~~

This will start up the changefeed in the background and return the `job_id`. The changefeed writes to Kafka.
This will start up the changefeed in the background and return the `job_id`. The changefeed writes to Kafka, and it will emit [`resolved` timestamps](create-changefeed.html#options) every 10 seconds. Depending on how quickly you insert into your watched tables, the output could look different than what is shown here.

14. In a new terminal, move into the extracted `confluent-<version>` directory and start watching the Kafka topics:

Expand All @@ -631,8 +637,11 @@ In this example, you'll set up a changefeed for a single-node cluster that is co
~~~ shell
{"after":{"office_dogs":{"id":{"long":1},"name":{"string":"Petee H"}}}}
{"after":{"office_dogs":{"id":{"long":2},"name":{"string":"Carl"}}}}
{"after":{"employees":{"dog_id":{"long":1},"employee_name":{"string":"Lauren"},"rowid":{"long":528537452042682369}}}}
{"after":{"employees":{"dog_id":{"long":2},"employee_name":{"string":"Spencer"},"rowid":{"long":528537452042747905}}}}
{"resolved":{"string":"1590613448530328000.0000000000"}}
{"after":{"employees":{"dog_id":{"long":1},"employee_name":{"string":"Lauren"},"rowid":{"long":558835089014325249}}}}
{"after":{"employees":{"dog_id":{"long":2},"employee_name":{"string":"Spencer"},"rowid":{"long":558835089014390785}}}}
{"resolved":{"string":"1590613448530328000.0000000000"}}
{"resolved":{"string":"1590613458969189000.0000000000"}}
~~~

The initial scan displays the state of the table as of when the changefeed started (therefore, the initial value of `"Petee"` is omitted).
Expand Down Expand Up @@ -768,7 +777,9 @@ In this example, you'll set up a changefeed for a single-node cluster that is co

{% include copy-clipboard.html %}
~~~ sql
> CREATE CHANGEFEED FOR TABLE office_dogs, employees INTO 'experimental-s3://example-bucket-name/test?AWS_ACCESS_KEY_ID=enter_key-here&AWS_SECRET_ACCESS_KEY=enter_key_here' with updated, resolved='10s';
> CREATE CHANGEFEED FOR TABLE office_dogs, employees \
INTO 'experimental-s3://example-bucket-name/test?AWS_ACCESS_KEY_ID=enter_key-here&AWS_SECRET_ACCESS_KEY=enter_key_here' \
WITH updated, resolved='10s';
~~~

~~~
Expand Down
Loading

0 comments on commit c8f06de

Please sign in to comment.