Skip to content

Commit 204dc55

Browse files
authored
[Improve][Doc] Add doc for mysql-cdc schema evolution (apache#7626)
1 parent 154e866 commit 204dc55

File tree

4 files changed

+162
-3
lines changed

4 files changed

+162
-3
lines changed

docs/en/concept/schema-evolution.md

+58
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Schema evolution
2+
Schema Evolution means that the schema of a data table can be changed and the data synchronization task can automatically adapt to the changes of the new table structure without any other operations.
3+
Now we only support the operation about `add column``drop column``rename column` and `modify column` of the table in CDC source. This feature is only support zeta engine at now.
4+
5+
## Supported connectors
6+
7+
### Source
8+
[Mysql-CDC](https://github.com/apache/seatunnel/blob/dev/docs/en/connector-v2/source/MySQL-CDC.md)
9+
10+
### Sink
11+
[Jdbc-Mysql](https://github.com/apache/seatunnel/blob/dev/docs/en/connector-v2/sink/Jdbc.md)
12+
13+
Note: The schema evolution is not support the transform at now.
14+
15+
## Enable schema evolution
16+
Schema evolution is disabled by default in CDC source. You need configure `debezium.include.schema.changes = true` which is only supported in MySQL-CDC to enable it.
17+
18+
## Examples
19+
20+
### Mysql-CDC -> Jdbc-Mysql
21+
```
22+
env {
23+
# You can set engine configuration here
24+
parallelism = 5
25+
job.mode = "STREAMING"
26+
checkpoint.interval = 5000
27+
read_limit.bytes_per_second=7000000
28+
read_limit.rows_per_second=400
29+
}
30+
31+
source {
32+
MySQL-CDC {
33+
server-id = 5652-5657
34+
username = "st_user_source"
35+
password = "mysqlpw"
36+
table-names = ["shop.products"]
37+
base-url = "jdbc:mysql://mysql_cdc_e2e:3306/shop"
38+
debezium = {
39+
include.schema.changes = true
40+
}
41+
}
42+
}
43+
44+
sink {
45+
jdbc {
46+
url = "jdbc:mysql://mysql_cdc_e2e:3306/shop"
47+
driver = "com.mysql.cj.jdbc.Driver"
48+
user = "st_user_sink"
49+
password = "mysqlpw"
50+
generate_sink_sql = true
51+
database = shop
52+
table = mysql_cdc_e2e_sink_table_with_schema_change_exactly_once
53+
primary_keys = ["id"]
54+
is_exactly_once = true
55+
xa_data_source_class_name = "com.mysql.cj.jdbc.MysqlXADataSource"
56+
}
57+
}
58+
```

docs/en/connector-v2/source/MySQL-CDC.md

+43-2
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ When an initial consistent snapshot is made for large databases, your establishe
169169

170170
## Source Options
171171

172-
| Name | Type | Required | Default | Description |
172+
| Name | Type | Required | Default | Description |
173173
|------------------------------------------------|----------|----------|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
174174
| base-url | String | Yes | - | The URL of the JDBC connection. Refer to a case: `jdbc:mysql://localhost:3306:3306/test`. |
175175
| username | String | Yes | - | Name of the database to use when connecting to the database server. |
@@ -196,7 +196,7 @@ When an initial consistent snapshot is made for large databases, your establishe
196196
| inverse-sampling.rate | Integer | No | 1000 | The inverse of the sampling rate used in the sample sharding strategy. For example, if this value is set to 1000, it means a 1/1000 sampling rate is applied during the sampling process. This option provides flexibility in controlling the granularity of the sampling, thus affecting the final number of shards. It's especially useful when dealing with very large datasets where a lower sampling rate is preferred. The default value is 1000. |
197197
| exactly_once | Boolean | No | false | Enable exactly once semantic. |
198198
| format | Enum | No | DEFAULT | Optional output format for MySQL CDC, valid enumerations are `DEFAULT``COMPATIBLE_DEBEZIUM_JSON`. |
199-
| debezium | Config | No | - | Pass-through [Debezium's properties](https://github.com/debezium/debezium/blob/v1.9.8.Final/documentation/modules/ROOT/pages/connectors/mysql.adoc#connector-properties) to Debezium Embedded Engine which is used to capture data changes from MySQL server. |
199+
| debezium | Config | No | - | Pass-through [Debezium's properties](https://github.com/debezium/debezium/blob/v1.9.8.Final/documentation/modules/ROOT/pages/connectors/mysql.adoc#connector-properties) to Debezium Embedded Engine which is used to capture data changes from MySQL server. Schema evolution is disabled by default. You need configure `debezium.include.schema.changes = true` to enable it. Now we only support `add column``drop column``rename column` and `modify column`. |
200200
| common-options | | no | - | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details |
201201

202202
## Task Example
@@ -263,6 +263,47 @@ sink {
263263
}
264264
}
265265
```
266+
### Support schema evolution
267+
```
268+
env {
269+
# You can set engine configuration here
270+
parallelism = 5
271+
job.mode = "STREAMING"
272+
checkpoint.interval = 5000
273+
read_limit.bytes_per_second=7000000
274+
read_limit.rows_per_second=400
275+
}
276+
277+
source {
278+
MySQL-CDC {
279+
server-id = 5652-5657
280+
username = "st_user_source"
281+
password = "mysqlpw"
282+
table-names = ["shop.products"]
283+
base-url = "jdbc:mysql://mysql_cdc_e2e:3306/shop"
284+
debezium = {
285+
include.schema.changes = true
286+
}
287+
}
288+
}
289+
290+
sink {
291+
jdbc {
292+
url = "jdbc:mysql://mysql_cdc_e2e:3306/shop"
293+
driver = "com.mysql.cj.jdbc.Driver"
294+
user = "st_user_sink"
295+
password = "mysqlpw"
296+
generate_sink_sql = true
297+
database = shop
298+
table = mysql_cdc_e2e_sink_table_with_schema_change_exactly_once
299+
primary_keys = ["id"]
300+
is_exactly_once = true
301+
xa_data_source_class_name = "com.mysql.cj.jdbc.MysqlXADataSource"
302+
}
303+
}
304+
305+
```
306+
266307

267308
## Changelog
268309

docs/sidebars.js

+2-1
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,8 @@ const sidebars = {
9393
'concept/sink-options-placeholders',
9494
'concept/sql-config',
9595
'concept/speed-limit',
96-
'concept/event-listener'
96+
'concept/event-listener',
97+
'concept/schema-evolution'
9798
]
9899
},
99100
{

docs/zh/concept/schema-evolution.md

+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# 模式演进
2+
模式演进是指数据表的Schema可以改变,数据同步任务可以自动适应新的表结构的变化而无需其他操作。
3+
现在我们只支持对CDC源中的表进行“添加列”、“删除列”、“重命名列”和“修改列”的操作。目前这个功能只支持zeta引擎。
4+
5+
## 已支持的连接器
6+
7+
###
8+
[Mysql-CDC](https://github.com/apache/seatunnel/blob/dev/docs/en/connector-v2/source/MySQL-CDC.md)
9+
10+
### 目标
11+
[Jdbc-Mysql](https://github.com/apache/seatunnel/blob/dev/docs/zh/connector-v2/sink/Jdbc.md)
12+
13+
注意: 目前模式演进不支持transform.
14+
15+
16+
## 启用Schema evolution功能
17+
在CDC源连接器中模式演进默认是关闭的。你需要在CDC连接器中配置`debezium.include.schema.changes = true`来启用它。
18+
19+
## 示例
20+
21+
### Mysql-CDC -> Jdbc-Mysql
22+
```
23+
env {
24+
# You can set engine configuration here
25+
parallelism = 5
26+
job.mode = "STREAMING"
27+
checkpoint.interval = 5000
28+
read_limit.bytes_per_second=7000000
29+
read_limit.rows_per_second=400
30+
}
31+
32+
source {
33+
MySQL-CDC {
34+
server-id = 5652-5657
35+
username = "st_user_source"
36+
password = "mysqlpw"
37+
table-names = ["shop.products"]
38+
base-url = "jdbc:mysql://mysql_cdc_e2e:3306/shop"
39+
debezium = {
40+
include.schema.changes = true
41+
}
42+
}
43+
}
44+
45+
sink {
46+
jdbc {
47+
url = "jdbc:mysql://mysql_cdc_e2e:3306/shop"
48+
driver = "com.mysql.cj.jdbc.Driver"
49+
user = "st_user_sink"
50+
password = "mysqlpw"
51+
generate_sink_sql = true
52+
database = shop
53+
table = mysql_cdc_e2e_sink_table_with_schema_change_exactly_once
54+
primary_keys = ["id"]
55+
is_exactly_once = true
56+
xa_data_source_class_name = "com.mysql.cj.jdbc.MysqlXADataSource"
57+
}
58+
}
59+
```

0 commit comments

Comments
 (0)