FuseQuery is a Cloud Distributed SQL Query Engine at scale.
Cloud-Native and Distributed ClickHouse from scratch in Rust.
Give thanks to ClickHouse and Arrow.
-
High Performance
- Everything is Parallelism
-
High Scalability
- Everything is Distributed
-
High Reliability
- True Separation of Storage and Compute
Crate | Description | Status |
---|---|---|
distributed | Distributed scheduler and executor for planner | WIP |
optimizers | Optimizer for Distributed&Local plan | WIP |
datablocks | Vectorized data processing unit | WIP |
datastreams | Async streaming iterators | WIP |
datasources | Interface to the datasource(system.numbers for performance/Fuse-Store) | WIP |
executors | Executor(EXPLAIN/SELECT) for the Pipeline | WIP |
functions | Scalar and Aggregation Functions | WIP |
processors | Dataflow Streaming Processor | WIP |
planners | Distributed&Local planners for building processor pipelines | WIP |
servers | Server handler(MySQL/HTTP) | MySQL |
transforms | Data Stream Transform(Source/Filter/Projection/AggregatorPartial/AggregatorFinal/Limit) | WIP |
- Projection
- Filter
- Limit
- Aggregate
- Functions
- Filter Push-Down
- Projection Push-Down (TODO)
- Distributed Query (WIP)
- Sorting (TODO)
- Joins (TODO)
- SubQueries (TODO)
- Memory SIMD-Vector processing performance only
- Dataset: 100,000,000,000 (100 Billion)
- Hardware: AMD Ryzen 7 PRO 4750U, 8 CPU Cores, 16 Threads
- Rust: rustc 1.49.0 (e1884a8e3 2020-12-29)
- Build with Link-time Optimization and Using CPU Specific Instructions
- ClickHouse server version 21.2.1 revision 54447
Query | FuseQuery (v0.1) | ClickHouse (v21.2.1) |
---|---|---|
SELECT avg(number) FROM system.numbers_mt | (3.11 s.) | ×3.14 slow, (9.77 s.) 10.24 billion rows/s., 81.92 GB/s. |
SELECT sum(number) FROM system.numbers_mt | (2.96 s.) | ×2.02 slow, (5.97 s.) 16.75 billion rows/s., 133.97 GB/s. |
SELECT min(number) FROM system.numbers_mt | (3.57 s.) | ×3.90 slow, (13.93 s.) 7.18 billion rows/s., 57.44 GB/s. |
SELECT max(number) FROM system.numbers_mt | (3.59 s.) | ×4.09 slow, (14.70 s.) 6.80 billion rows/s., 54.44 GB/s. |
SELECT count(number) FROM system.numbers_mt | (1.76 s.) | ×2.22 slow, (3.91 s.) 25.58 billion rows/s., 204.65 GB/s. |
SELECT sum(number+number+number) FROM numbers_mt | (23.14 s.) | ×5.47 slow, (126.67 s.) 789.47 million rows/s., 6.32 GB/s. |
SELECT sum(number) / count(number) FROM system.numbers_mt | (3.09 s.) | ×1.96 slow, (6.07 s.) 16.48 billion rows/s., 131.88 GB/s. |
SELECT sum(number) / count(number), max(number), min(number) FROM system.numbers_mt | (6.73 s.) | ×4.01 slow, (27.59 s.) 3.62 billion rows/s., 28.99 GB/s. |
Note:
- ClickHouse system.numbers_mt is 16-way parallelism processing
- FuseQuery system.numbers_mt is 16-way parallelism processing
Run from source
$ make run
12:46:15 [ INFO] Options { log_level: "debug", num_cpus: 8, mysql_handler_port: 3307 }
12:46:15 [ INFO] Fuse-Query Cloud Compute Starts...
12:46:15 [ INFO] Usage: mysql -h127.0.0.1 -P3307
or Run with docker(Recommended):
$ docker pull datafusedev/fuse-query
...
$ docker run --init --rm -p 3307:3307 datafusedev/fuse-query
05:12:36 [ INFO] Options { log_level: "debug", num_cpus: 6, mysql_handler_port: 3307 }
05:12:36 [ INFO] Fuse-Query Cloud Compute Starts...
05:12:36 [ INFO] Usage: mysql -h127.0.0.1 -P3307
or Download the release binary here:
https://github.com/datafusedev/fuse-query/releases
$ mysql -h127.0.0.1 -P3307
mysql> explain select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| explain |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Limit: 3
Projection: (number + 1) as c1:UInt64, (number / 2) as c2:UInt64
Filter: (((c1 + c2) + 1) < 100)
ReadDataSource: scan parts [8](Read from system.numbers_mt table, Read Rows:10000000, Read Bytes:80000000) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> explain pipeline select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| explain |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
└─ LimitTransform × 1 processor
└─ Merge (LimitTransform × 8 processors) to (MergeProcessor × 1)
└─ LimitTransform × 8 processors
└─ ProjectionTransform × 8 processors
└─ FilterTransform × 8 processors
└─ SourceTransform × 8 processors |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+------+------+
| c1 | c2 |
+------+------+
| 1 | 0 |
| 2 | 0 |
| 3 | 1 |
+------+------+
3 rows in set (0.06 sec)
$ make test
- 0.1 support aggregation select
- 0.2 support distributed query (WIP)
- 0.3 support group by, order by
- 0.4 support join
- 0.5 support sub queries
- 0.6 support TPC-H benchmark