Skip to content

Commit dc271dc

Browse files
authored
[Feature][Connector-V2] [Hudi]Add hudi sink connector (apache#4405)
1 parent d663398 commit dc271dc

File tree

28 files changed

+2021
-762
lines changed

28 files changed

+2021
-762
lines changed

docs/en/Connector-v2-release-state.md

-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ SeaTunnel uses a grading system for connectors to help you understand what to ex
3838
| [Hive](connector-v2/source/Hive.md) | Source | GA | 2.2.0-beta |
3939
| [Http](connector-v2/sink/Http.md) | Sink | Beta | 2.2.0-beta |
4040
| [Http](connector-v2/source/Http.md) | Source | Beta | 2.2.0-beta |
41-
| [Hudi](connector-v2/source/Hudi.md) | Source | Beta | 2.2.0-beta |
4241
| [Iceberg](connector-v2/source/Iceberg.md) | Source | Beta | 2.2.0-beta |
4342
| [InfluxDB](connector-v2/sink/InfluxDB.md) | Sink | Beta | 2.3.0 |
4443
| [InfluxDB](connector-v2/source/InfluxDB.md) | Source | Beta | 2.3.0-beta |

docs/en/connector-v2/sink/Hudi.md

+98
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Hudi
2+
3+
> Hudi sink connector
4+
5+
## Description
6+
7+
Used to write data to Hudi.
8+
9+
## Key features
10+
11+
- [x] [exactly-once](../../concept/connector-v2-features.md)
12+
- [x] [cdc](../../concept/connector-v2-features.md)
13+
14+
## Options
15+
16+
| name | type | required | default value |
17+
|----------------------------|--------|----------|---------------|
18+
| table_name | string | yes | - |
19+
| table_dfs_path | string | yes | - |
20+
| conf_files_path | string | no | - |
21+
| record_key_fields | string | no | - |
22+
| partition_fields | string | no | - |
23+
| table_type | enum | no | copy_on_write |
24+
| op_type | enum | no | insert |
25+
| batch_interval_ms | Int | no | 1000 |
26+
| insert_shuffle_parallelism | Int | no | 2 |
27+
| upsert_shuffle_parallelism | Int | no | 2 |
28+
| min_commits_to_keep | Int | no | 20 |
29+
| max_commits_to_keep | Int | no | 30 |
30+
| common-options | config | no | - |
31+
32+
### table_name [string]
33+
34+
`table_name` The name of hudi table.
35+
36+
### table_dfs_path [string]
37+
38+
`table_dfs_path` The dfs root path of hudi table,such as 'hdfs://nameserivce/data/hudi/hudi_table/'.
39+
40+
### table_type [enum]
41+
42+
`table_type` The type of hudi table. The value is 'copy_on_write' or 'merge_on_read'.
43+
44+
### conf_files_path [string]
45+
46+
`conf_files_path` The environment conf file path list(local path), which used to init hdfs client to read hudi table file. The example is '/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml'.
47+
48+
### op_type [enum]
49+
50+
`op_type` The operation type of hudi table. The value is 'insert' or 'upsert' or 'bulk_insert'.
51+
52+
### batch_interval_ms [Int]
53+
54+
`batch_interval_ms` The interval time of batch write to hudi table.
55+
56+
### insert_shuffle_parallelism [Int]
57+
58+
`insert_shuffle_parallelism` The parallelism of insert data to hudi table.
59+
60+
### upsert_shuffle_parallelism [Int]
61+
62+
`upsert_shuffle_parallelism` The parallelism of upsert data to hudi table.
63+
64+
### min_commits_to_keep [Int]
65+
66+
`min_commits_to_keep` The min commits to keep of hudi table.
67+
68+
### max_commits_to_keep [Int]
69+
70+
`max_commits_to_keep` The max commits to keep of hudi table.
71+
72+
### common options
73+
74+
Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details.
75+
76+
## Examples
77+
78+
```hocon
79+
source {
80+
81+
Hudi {
82+
table_dfs_path = "hdfs://nameserivce/data/hudi/hudi_table/"
83+
table_type = "copy_on_write"
84+
conf_files_path = "/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml"
85+
use.kerberos = true
86+
kerberos.principal = "test_user@xxx"
87+
kerberos.principal.file = "/home/test/test_user.keytab"
88+
}
89+
90+
}
91+
```
92+
93+
## Changelog
94+
95+
### 2.2.0-beta 2022-09-26
96+
97+
- Add Hudi Source Connector
98+

docs/en/connector-v2/source/Hudi.md

-90
This file was deleted.

docs/zh/Connector-v2-release-state.md

-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ SeaTunnel 使用连接器分级系统来帮助您了解连接器的期望:
3838
| [Hive](../en/connector-v2/source/Hive.md) | Source | GA | 2.2.0-beta |
3939
| [Http](connector-v2/sink/Http.md) | Sink | Beta | 2.2.0-beta |
4040
| [Http](../en/connector-v2/source/Http.md) | Source | Beta | 2.2.0-beta |
41-
| [Hudi](../en/connector-v2/source/Hudi.md) | Source | Beta | 2.2.0-beta |
4241
| [Iceberg](../en/connector-v2/source/Iceberg.md) | Source | Beta | 2.2.0-beta |
4342
| [InfluxDB](../en/connector-v2/sink/InfluxDB.md) | Sink | Beta | 2.3.0 |
4443
| [InfluxDB](../en/connector-v2/source/InfluxDB.md) | Source | Beta | 2.3.0-beta |

docs/zh/connector-v2/sink/Hudi.md

+92
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Hudi
2+
3+
> Hudi 接收器连接器
4+
5+
## 描述
6+
7+
用于将数据写入 Hudi。
8+
9+
## 主要特点
10+
11+
- [x] [exactly-once](../../concept/connector-v2-features.md)
12+
- [x] [cdc](../../concept/connector-v2-features.md)
13+
14+
## 选项
15+
16+
| 名称 | 类型 | 是否必需 | 默认值 |
17+
|----------------------------|--------|------|---------------|
18+
| table_name | string || - |
19+
| table_dfs_path | string || - |
20+
| conf_files_path | string || - |
21+
| record_key_fields | string || - |
22+
| partition_fields | string || - |
23+
| table_type | enum || copy_on_write |
24+
| op_type | enum || insert |
25+
| batch_interval_ms | Int || 1000 |
26+
| insert_shuffle_parallelism | Int || 2 |
27+
| upsert_shuffle_parallelism | Int || 2 |
28+
| min_commits_to_keep | Int || 20 |
29+
| max_commits_to_keep | Int || 30 |
30+
| common-options | config || - |
31+
32+
### table_name [string]
33+
34+
`table_name` Hudi 表的名称。
35+
36+
### table_dfs_path [string]
37+
38+
`table_dfs_path` Hudi 表的 DFS 根路径,例如 "hdfs://nameservice/data/hudi/hudi_table/"。
39+
40+
### table_type [enum]
41+
42+
`table_type` Hudi 表的类型。
43+
44+
### conf_files_path [string]
45+
46+
`conf_files_path` 环境配置文件路径列表(本地路径),用于初始化 HDFS 客户端以读取 Hudi 表文件。示例:"/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml"。
47+
48+
### op_type [enum]
49+
50+
`op_type` Hudi 表的操作类型。值可以是 'insert'、'upsert' 或 'bulk_insert'。
51+
52+
### batch_interval_ms [Int]
53+
54+
`batch_interval_ms` 批量写入 Hudi 表的时间间隔。
55+
56+
### insert_shuffle_parallelism [Int]
57+
58+
`insert_shuffle_parallelism` 插入数据到 Hudi 表的并行度。
59+
60+
### upsert_shuffle_parallelism [Int]
61+
62+
`upsert_shuffle_parallelism` 更新插入数据到 Hudi 表的并行度。
63+
64+
### min_commits_to_keep [Int]
65+
66+
`min_commits_to_keep` Hudi 表保留的最少提交数。
67+
68+
### max_commits_to_keep [Int]
69+
70+
`max_commits_to_keep` Hudi 表保留的最多提交数。
71+
72+
### 通用选项
73+
74+
数据源插件的通用参数,请参考 [Source Common Options](common-options.md) 了解详细信息。
75+
76+
## 示例
77+
78+
```hocon
79+
source {
80+
81+
Hudi {
82+
table_dfs_path = "hdfs://nameserivce/data/hudi/hudi_table/"
83+
table_type = "cow"
84+
conf_files_path = "/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml"
85+
use.kerberos = true
86+
kerberos.principal = "test_user@xxx"
87+
kerberos.principal.file = "/home/test/test_user.keytab"
88+
}
89+
90+
}
91+
```
92+

plugin-mapping.properties

+1-1
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,6 @@ seatunnel.sink.OssJindoFile = connector-file-jindo-oss
5252
seatunnel.source.CosFile = connector-file-cos
5353
seatunnel.sink.CosFile = connector-file-cos
5454
seatunnel.source.Pulsar = connector-pulsar
55-
seatunnel.source.Hudi = connector-hudi
5655
seatunnel.sink.DingTalk = connector-dingtalk
5756
seatunnel.source.Elasticsearch = connector-elasticsearch
5857
seatunnel.sink.Elasticsearch = connector-elasticsearch
@@ -119,6 +118,7 @@ seatunnel.source.AmazonSqs = connector-amazonsqs
119118
seatunnel.sink.AmazonSqs = connector-amazonsqs
120119
seatunnel.source.Paimon = connector-paimon
121120
seatunnel.sink.Paimon = connector-paimon
121+
seatunnel.sink.hudi = connector-hudi
122122
seatunnel.sink.Druid = connector-druid
123123
seatunnel.source.Easysearch = connector-easysearch
124124
seatunnel.sink.Easysearch = connector-easysearch

0 commit comments

Comments
 (0)