|
| 1 | +# Debezium Format |
| 2 | + |
| 3 | +Changelog-Data-Capture Format: Serialization Schema Format: Deserialization Schema |
| 4 | + |
| 5 | +Debezium is a set of distributed services to capture changes in your databases so that your applications can see those changes and respond to them. Debezium records all row-level changes within each database table in a *change event stream*, and applications simply read these streams to see the change events in the same order in which they occurred. |
| 6 | + |
| 7 | +Seatunnel supports to interpret Debezium JSON messages as INSERT/UPDATE/DELETE messages into seatunnel system. This is useful in many cases to leverage this feature, such as |
| 8 | + |
| 9 | + synchronizing incremental data from databases to other systems |
| 10 | + auditing logs |
| 11 | + real-time materialized views on databases |
| 12 | + temporal join changing history of a database table and so on. |
| 13 | + |
| 14 | +Seatunnel also supports to encode the INSERT/UPDATE/DELETE messages in Seatunnel asDebezium JSON messages, and emit to storage like Kafka. |
| 15 | + |
| 16 | +# Format Options |
| 17 | + |
| 18 | +| option | default | required | Description | |
| 19 | +|-----------------------------------|---------|----------|------------------------------------------------------------------------------------------------------| |
| 20 | +| format | (none) | yes | Specify what format to use, here should be 'debezium_json'. | |
| 21 | +| debezium-json.ignore-parse-errors | false | no | Skip fields and rows with parse errors instead of failing. Fields are set to null in case of errors. | |
| 22 | + |
| 23 | +# How to use Debezium format |
| 24 | + |
| 25 | +## Kafka uses example |
| 26 | + |
| 27 | +Debezium provides a unified format for changelog, here is a simple example for an update operation captured from a MySQL products table: |
| 28 | + |
| 29 | +```bash |
| 30 | +{ |
| 31 | + "before": { |
| 32 | + "id": 111, |
| 33 | + "name": "scooter", |
| 34 | + "description": "Big 2-wheel scooter ", |
| 35 | + "weight": 5.18 |
| 36 | + }, |
| 37 | + "after": { |
| 38 | + "id": 111, |
| 39 | + "name": "scooter", |
| 40 | + "description": "Big 2-wheel scooter ", |
| 41 | + "weight": 5.17 |
| 42 | + }, |
| 43 | + "source": { |
| 44 | + "version": "1.1.1.Final", |
| 45 | + "connector": "mysql", |
| 46 | + "name": "dbserver1", |
| 47 | + "ts_ms": 1589362330000, |
| 48 | + "snapshot": "false", |
| 49 | + "db": "inventory", |
| 50 | + "table": "products", |
| 51 | + "server_id": 223344, |
| 52 | + "gtid": null, |
| 53 | + "file": "mysql-bin.000003", |
| 54 | + "pos": 2090, |
| 55 | + "row": 0, |
| 56 | + "thread": 2, |
| 57 | + "query": null |
| 58 | + }, |
| 59 | + "op": "u", |
| 60 | + "ts_ms": 1589362330904, |
| 61 | + "transaction": null |
| 62 | +} |
| 63 | +``` |
| 64 | + |
| 65 | +Note: please refer to Debezium documentation about the meaning of each fields. |
| 66 | + |
| 67 | +The MySQL products table has 4 columns (id, name, description and weight). |
| 68 | +The above JSON message is an update change event on the products table where the weight value of the row with id = 111 is changed from 5.18 to 5.15. |
| 69 | +Assuming the messages have been synchronized to Kafka topic products_binlog, then we can use the following Seatunnel conf to consume this topic and interpret the change events by Debezium format. |
| 70 | + |
| 71 | +```bash |
| 72 | +env { |
| 73 | + execution.parallelism = 1 |
| 74 | + job.mode = "BATCH" |
| 75 | +} |
| 76 | + |
| 77 | +source { |
| 78 | + Kafka { |
| 79 | + bootstrap.servers = "kafkaCluster:9092" |
| 80 | + topic = "products_binlog" |
| 81 | + result_table_name = "kafka_name" |
| 82 | + start_mode = earliest |
| 83 | + schema = { |
| 84 | + fields { |
| 85 | + id = "int" |
| 86 | + name = "string" |
| 87 | + description = "string" |
| 88 | + weight = "string" |
| 89 | + } |
| 90 | + } |
| 91 | + format = debezium_json |
| 92 | + } |
| 93 | + |
| 94 | +} |
| 95 | + |
| 96 | +transform { |
| 97 | +} |
| 98 | + |
| 99 | +sink { |
| 100 | + Kafka { |
| 101 | + bootstrap.servers = "kafkaCluster:9092" |
| 102 | + topic = "consume-binlog" |
| 103 | + format = debezium_json |
| 104 | + } |
| 105 | +} |
| 106 | +``` |
| 107 | + |
0 commit comments