Skip to content

Commit

Permalink
[enhencement](sample) add delta and kudu samples (apache#40067)
Browse files Browse the repository at this point in the history
  • Loading branch information
BePPPower authored Sep 4, 2024
1 parent 0baa936 commit b273849
Show file tree
Hide file tree
Showing 17 changed files with 864 additions and 0 deletions.
147 changes: 147 additions & 0 deletions samples/datalake/deltalake_and_kudu/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Doris + DeltaLake + Kudu + MINIO Environments
Launch spark / doris / hive / deltalake / kudu /minio test environments, and give examples to query deltalake and kudu tables in Doris.

## Launch Docker Compose
**Create Network**
```shell
sudo docker network create -d bridge trinoconnector-net
```
**Launch all components in docker**
```shell
sudo sh start-trinoconnector-compose.sh
```
**Login into Spark**
```shell
sudo sh login-spark.sh
```
**Login into Doris**
```shell
sudo sh login-doris.sh
```

## Prepare DeltaLake Data
There's already a deltalake table named `customer` in default database.

## Create Catalog
The Doris Cluster has created two catalogs called `delta_lake` and `kudu_catalog`. You can view both of them by using the `SHOW CATALOGS` command or the `SHOW CREATE CATALOG ${catalog_name}` command after you log in to the Doris. Here are the creation statements for the two catalogs:

```sql
-- The catalog has been created, and no further action is required.
create catalog delta_lake properties (
"type"="trino-connector",
"trino.connector.name"="delta_lake",
"trino.hive.metastore.uri"="thrift://hive-metastore:9083",
"trino.hive.s3.endpoint"="http://minio:9000",
"trino.hive.s3.region"="us-east-1",
"trino.hive.s3.aws-access-key"="minio",
"trino.hive.s3.aws-secret-key"="minio123",
"trino.hive.s3.path-style-access"="true"
);

-- The catalog has been created, and no further action is required.
CREATE CATALOG `kudu_catalog` PROPERTIES (
"type" = "trino-connector",
"trino.connector.name" = "kudu",
"trino.kudu.authentication.type" = "NONE",
"trino.kudu.client.master-addresses" = "kudu-master-1:7051,kudu-master-2:7151,kudu-master-3:7251"
);
```

## Query Catalog Data
The data of `Delta Lake` and `Kudu` have been prepared in Doris Cluster. You can select these data directly in Doris.

- select Delta Lake data

```sql
mysql> switch delta_lake;
Query OK, 0 rows affected (0.00 sec)

mysql> use default;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from customer limit 10;
+-----------+--------------------+------------------------------------+-------------+-----------------+-----------+--------------+---------------------------------------------------------------------------------------------------------------+
| c_custkey | c_name | c_address | c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment |
+-----------+--------------------+------------------------------------+-------------+-----------------+-----------+--------------+---------------------------------------------------------------------------------------------------------------+
| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak | 13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely ironic theodolites integrate boldly: caref |
| 34 | Customer#000000034 | Q6G9wZ6dnczmtOx509xgE,M2KV | 15 | 25-344-968-5422 | 8589.70 | HOUSEHOLD | nder against the even, pending accounts. even |
| 66 | Customer#000000066 | XbsEqXH1ETbJYYtA1A | 22 | 32-213-373-5094 | 242.77 | HOUSEHOLD | le slyly accounts. carefully silent packages benea |
| 98 | Customer#000000098 | 7yiheXNSpuEAwbswDW | 12 | 22-885-845-6889 | -551.37 | BUILDING | ages. furiously pending accounts are quickly carefully final foxes: busily pe |
| 130 | Customer#000000130 | RKPx2OfZy0Vn 8wGWZ7F2EAvmMORl1k8iH | 9 | 19-190-993-9281 | 5073.58 | HOUSEHOLD | ix slowly. express packages along the furiously ironic requests integrate daringly deposits. fur |
| 162 | Customer#000000162 | JE398sXZt2QuKXfJd7poNpyQFLFtth | 8 | 18-131-101-2267 | 6268.99 | MACHINERY | accounts along the doggedly special asymptotes boost blithely during the quickly regular theodolites. slyly |
| 194 | Customer#000000194 | mksKhdWuQ1pjbc4yffHp8rRmLOMcJ | 16 | 26-597-636-3003 | 6696.49 | HOUSEHOLD | quickly across the fluffily dogged requests. regular platelets around the ironic, even requests cajole quickl |
| 226 | Customer#000000226 | ToEmqB90fM TkLqyEgX8MJ8T8NkK | 3 | 13-452-318-7709 | 9008.61 | AUTOMOBILE | ic packages. ideas cajole furiously slyly special theodolites: carefully express pinto beans acco |
| 258 | Customer#000000258 | 7VbADek8qYezQYotxNUmnNI | 12 | 22-278-425-9944 | 6022.27 | MACHINERY | about the regular, bold accounts; pending packages use furiously stealthy warhorses. bold accounts sleep fur |
| 290 | Customer#000000290 | 8OlPT9G 8UqVXmVZNbmxVTPO8 | 4 | 14-458-625-5633 | 1811.35 | MACHINERY | sts. blithely pending requests sleep fluffily on the regular excuses. carefully expre |
+-----------+--------------------+------------------------------------+-------------+-----------------+-----------+--------------+---------------------------------------------------------------------------------------------------------------+
10 rows in set (0.12 sec)
```

- select Kudu data

```sql
mysql> switch kudu_catalog;
Query OK, 0 rows affected (0.00 sec)

mysql> use default;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed

mysql> select * from test_table limit 10;
+------+----------+--------+
| key | value | added |
+------+----------+--------+
| 0 | NULL | 12.345 |
| 4 | NULL | 12.345 |
| 20 | NULL | 12.345 |
| 26 | NULL | 12.345 |
| 29 | value 29 | 12.345 |
| 42 | NULL | 12.345 |
| 50 | NULL | 12.345 |
| 56 | NULL | 12.345 |
| 66 | NULL | 12.345 |
| 74 | NULL | 12.345 |
+------+----------+--------+
10 rows in set (1.49 sec)
```

- federation query

```sql
mysql> select * from delta_lake.`default`.customer c join kudu_catalog.`default`.test_table t on c.c_custkey = t.`key` where c.c_custkey < 50;
+-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+------+----------+--------+
| c_custkey | c_name | c_address | c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment | key | value | added |
+-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+------+----------+--------+
| 1 | Customer#000000001 | IVhzIApeRb ot,c,E | 15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular platelets. regular, ironic epitaphs nag e | 1 | value 1 | 12.345 |
| 33 | Customer#000000033 | qFSlMuLucBmx9xnn5ib2csWUweg D | 17 | 27-375-391-1280 | -78.56 | AUTOMOBILE | s. slyly regular accounts are furiously. carefully pending requests | 33 | value 33 | 12.345 |
| 3 | Customer#000000003 | MG9kdTD2WBHm | 1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic, even instructions. express foxes detect slyly. blithely even accounts abov | 3 | value 3 | 12.345 |
| 35 | Customer#000000035 | TEjWGE4nBzJL2 | 17 | 27-566-888-7431 | 1228.24 | HOUSEHOLD | requests. special, express requests nag slyly furiousl | 35 | value 35 | 12.345 |
| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak | 13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely ironic theodolites integrate boldly: caref | 2 | NULL | 12.345 |
| 34 | Customer#000000034 | Q6G9wZ6dnczmtOx509xgE,M2KV | 15 | 25-344-968-5422 | 8589.70 | HOUSEHOLD | nder against the even, pending accounts. even | 34 | NULL | 12.345 |
| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tlp2iQ6ZcO3J | 15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final, furious requests across the e | 32 | NULL | 12.345 |
+-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+------+----------+--------+
7 rows in set (0.13 sec)
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{"commitInfo":{"timestamp":1724747485883,"operation":"WRITE","operationParameters":{"mode":"ErrorIfExists","partitionBy":"[]"},"isolationLevel":"Serializable","isBlindAppend":true,"operationMetrics":{"numFiles":"4","numOutputRows":"18751","numOutputBytes":"1564827"},"engineInfo":"Apache-Spark/3.4.2 Delta-Lake/2.4.0","txnId":"1646d68c-f6f2-4da5-a9bf-56318b2b7216"}}
{"protocol":{"minReaderVersion":1,"minWriterVersion":2}}
{"metaData":{"id":"421eb35b-e9ec-44ed-92fd-25e0fda91036","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"c_custkey\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c_address\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c_nationkey\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c_phone\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c_acctbal\",\"type\":\"decimal(12,2)\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c_mktsegment\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"c_comment\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1724747483772}}
{"add":{"path":"part-00000-44ff362c-110d-44ca-aed8-93ed65c19492-c000.snappy.parquet","partitionValues":{},"size":392744,"modificationTime":1724747485000,"dataChange":true,"stats":"{\"numRecords\":4688,\"minValues\":{\"c_custkey\":2,\"c_name\":\"Customer#000000002\",\"c_address\":\" UfkcgKnrSL0VRSDuuXjXW,\",\"c_nationkey\":0,\"c_phone\":\"10-103-318-6809\",\"c_acctbal\":-998.90,\"c_mktsegment\":\"AUTOMOBILE\",\"c_comment\":\" Tiresias detect always about \"},\"maxValues\":{\"c_custkey\":149986,\"c_name\":\"Customer#000149986\",\"c_address\":\"zyTHzirSOvDeqwIs4R7qn76825FPYr8Y�\",\"c_nationkey\":24,\"c_phone\":\"34-999-195-7029\",\"c_acctbal\":9994.63,\"c_mktsegment\":\"MACHINERY\",\"c_comment\":\"ze! special, even deposits nag q�\"},\"nullCount\":{\"c_custkey\":0,\"c_name\":0,\"c_address\":0,\"c_nationkey\":0,\"c_phone\":0,\"c_acctbal\":0,\"c_mktsegment\":0,\"c_comment\":0}}"}}
{"add":{"path":"part-00001-749ded2d-a84b-4e2c-9e6f-5ac6a59ee91d-c000.snappy.parquet","partitionValues":{},"size":392284,"modificationTime":1724747485000,"dataChange":true,"stats":"{\"numRecords\":4687,\"minValues\":{\"c_custkey\":32,\"c_name\":\"Customer#000000032\",\"c_address\":\" FjVZqAg2Pd9jhTN8pVD4DkvmxlCxMm\",\"c_nationkey\":0,\"c_phone\":\"10-105-777-9167\",\"c_acctbal\":-994.43,\"c_mktsegment\":\"AUTOMOBILE\",\"c_comment\":\" about the fluffily bold ideas. \"},\"maxValues\":{\"c_custkey\":149984,\"c_name\":\"Customer#000149984\",\"c_address\":\"zz5LSqGU2QoyQTcMzkOxVqWrHedmhqQ6�\",\"c_nationkey\":24,\"c_phone\":\"34-997-204-5897\",\"c_acctbal\":9998.01,\"c_mktsegment\":\"MACHINERY\",\"c_comment\":\"zzle quickly bold packages. sile�\"},\"nullCount\":{\"c_custkey\":0,\"c_name\":0,\"c_address\":0,\"c_nationkey\":0,\"c_phone\":0,\"c_acctbal\":0,\"c_mktsegment\":0,\"c_comment\":0}}"}}
{"add":{"path":"part-00002-137a1b68-bafd-46a7-b231-400a174b520c-c000.snappy.parquet","partitionValues":{},"size":390594,"modificationTime":1724747485000,"dataChange":true,"stats":"{\"numRecords\":4688,\"minValues\":{\"c_custkey\":1,\"c_name\":\"Customer#000000001\",\"c_address\":\" NUi8asf651zG096JTGeXdh\",\"c_nationkey\":0,\"c_phone\":\"10-100-220-4520\",\"c_acctbal\":-999.55,\"c_mktsegment\":\"AUTOMOBILE\",\"c_comment\":\" Tiresias detect slyly according\"},\"maxValues\":{\"c_custkey\":149985,\"c_name\":\"Customer#000149985\",\"c_address\":\"zzbUlYAy9rhCprBVHlzA\",\"c_nationkey\":24,\"c_phone\":\"34-999-363-7145\",\"c_acctbal\":9997.80,\"c_mktsegment\":\"MACHINERY\",\"c_comment\":\"zzle blithely against the carefu�\"},\"nullCount\":{\"c_custkey\":0,\"c_name\":0,\"c_address\":0,\"c_nationkey\":0,\"c_phone\":0,\"c_acctbal\":0,\"c_mktsegment\":0,\"c_comment\":0}}"}}
{"add":{"path":"part-00003-75203d54-ef95-4fbe-95c9-2012fd9dbaed-c000.snappy.parquet","partitionValues":{},"size":389205,"modificationTime":1724747485000,"dataChange":true,"stats":"{\"numRecords\":4688,\"minValues\":{\"c_custkey\":3,\"c_name\":\"Customer#000000003\",\"c_address\":\" 821GWWou3sOyp,\",\"c_nationkey\":0,\"c_phone\":\"10-105-204-5643\",\"c_acctbal\":-999.95,\"c_mktsegment\":\"AUTOMOBILE\",\"c_comment\":\" about the fluffily regular asym\"},\"maxValues\":{\"c_custkey\":149987,\"c_name\":\"Customer#000149987\",\"c_address\":\"zwrbKxxY yL Go\",\"c_nationkey\":24,\"c_phone\":\"34-999-283-6448\",\"c_acctbal\":9997.73,\"c_mktsegment\":\"MACHINERY\",\"c_comment\":\"ymptotes. unusual theodolites ab�\"},\"nullCount\":{\"c_custkey\":0,\"c_name\":0,\"c_address\":0,\"c_nationkey\":0,\"c_phone\":0,\"c_acctbal\":0,\"c_mktsegment\":0,\"c_comment\":0}}"}}
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
20 changes: 20 additions & 0 deletions samples/datalake/deltalake_and_kudu/login-doris.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

docker exec -it spark-hive mysql -u root -h doris-env -P 9030
20 changes: 20 additions & 0 deletions samples/datalake/deltalake_and_kudu/login-spark.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

docker exec -it spark-hive /opt/scripts/spark-delta.sh
37 changes: 37 additions & 0 deletions samples/datalake/deltalake_and_kudu/scripts/create-delta-table.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

export SPARK_HOME=/opt/spark
export HIVE_HOME=/opt/apache-hive-3.1.2-bin
export HADOOP_HOME=/opt/hadoop-3.3.1

if [[ ! -d "${SPARK_HOME}" ]]; then
cp -r /opt/spark-3.4.2-bin-hadoop3 "${SPARK_HOME}"
fi

cp "${HIVE_HOME}"/conf/hive-site.xml "${SPARK_HOME}"/conf/
cp "${HIVE_HOME}"/lib/postgresql-jdbc.jar "${SPARK_HOME}"/jars/
cp "${HADOOP_HOME}"/etc/hadoop/core-site.xml "${SPARK_HOME}"/conf/

"${SPARK_HOME}"/bin/spark-sql \
--master local[*] \
--name "spark-delta-sql" \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \
-f /opt/scripts/spark-delta.sql
20 changes: 20 additions & 0 deletions samples/datalake/deltalake_and_kudu/scripts/doris-sql.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
create catalog delta_lake properties (
"type"="trino-connector",
"trino.connector.name"="delta_lake",
"trino.hive.metastore.uri"="thrift://hive-metastore:9083",
"trino.hive.s3.endpoint"="http://minio:9000",
"trino.hive.s3.region"="us-east-1",
"trino.hive.s3.aws-access-key"="minio",
"trino.hive.s3.aws-secret-key"="minio123",
"trino.hive.s3.path-style-access"="true"
);


CREATE CATALOG `kudu_catalog` PROPERTIES (
"type" = "trino-connector",
"trino.connector.name" = "kudu",
"trino.kudu.authentication.type" = "NONE",
"trino.kudu.client.master-addresses" = "kudu-master-1:7051,kudu-master-2:7151,kudu-master-3:7251"
);

ALTER SYSTEM ADD BACKEND 'doris-env:9050';
Loading

0 comments on commit b273849

Please sign in to comment.