Skip to content

FuseQuery is a Cloud-Native SQL Query Engine at scale

License

Notifications You must be signed in to change notification settings

jinyaqia/fuse-query

 
 

Repository files navigation

FuseQuery

Github Actions Status Github Actions Status Github Actions Status codecov.io Platform License

FuseQuery is a Cloud Distributed SQL Query Engine at scale.

Cloud-Native and Distributed ClickHouse from scratch in Rust.

Give thanks to ClickHouse and Arrow.

Features

  • High Performance

    • Everything is Parallelism
  • High Scalability

    • Everything is Distributed
  • High Reliability

    • True Separation of Storage and Compute

Architecture

DataFuse Architecture

Crates

Crate Description Status
distributed Distributed scheduler and executor for planner WIP
optimizers Optimizer for Distributed&Local plan WIP
datablocks Vectorized data processing unit WIP
datastreams Async streaming iterators WIP
datasources Interface to the datasource(system.numbers for performance/Fuse-Store) WIP
executors Executor(EXPLAIN/SELECT) for the Pipeline WIP
functions Scalar and Aggregation Functions WIP
processors Dataflow Streaming Processor WIP
planners Distributed&Local planners for building processor pipelines WIP
servers Server handler(MySQL/HTTP) MySQL
transforms Data Stream Transform(Source/Filter/Projection/AggregatorPartial/AggregatorFinal/Limit) WIP

Status

SQL Support

  • Projection
  • Filter
  • Limit
  • Aggregate
  • Functions
  • Filter Push-Down
  • Projection Push-Down (TODO)
  • Distributed Query (WIP)
  • Sorting (TODO)
  • Joins (TODO)
  • SubQueries (TODO)

Performance

  • Memory SIMD-Vector processing performance only
  • Dataset: 100,000,000,000 (100 Billion)
  • Hardware: AMD Ryzen 7 PRO 4750U, 8 CPU Cores, 16 Threads
  • Rust: rustc 1.49.0 (e1884a8e3 2020-12-29)
  • Build with Link-time Optimization and Using CPU Specific Instructions
  • ClickHouse server version 21.2.1 revision 54447
Query FuseQuery (v0.1) ClickHouse (v21.2.1)
SELECT avg(number) FROM system.numbers_mt (3.11 s.) ×3.14 slow, (9.77 s.)
10.24 billion rows/s., 81.92 GB/s.
SELECT sum(number) FROM system.numbers_mt (2.96 s.) ×2.02 slow, (5.97 s.)
16.75 billion rows/s., 133.97 GB/s.
SELECT min(number) FROM system.numbers_mt (3.57 s.) ×3.90 slow, (13.93 s.)
7.18 billion rows/s., 57.44 GB/s.
SELECT max(number) FROM system.numbers_mt (3.59 s.) ×4.09 slow, (14.70 s.)
6.80 billion rows/s., 54.44 GB/s.
SELECT count(number) FROM system.numbers_mt (1.76 s.) ×2.22 slow, (3.91 s.)
25.58 billion rows/s., 204.65 GB/s.
SELECT sum(number+number+number) FROM numbers_mt (23.14 s.) ×5.47 slow, (126.67 s.)
789.47 million rows/s., 6.32 GB/s.
SELECT sum(number) / count(number) FROM system.numbers_mt (3.09 s.) ×1.96 slow, (6.07 s.)
16.48 billion rows/s., 131.88 GB/s.
SELECT sum(number) / count(number), max(number), min(number) FROM system.numbers_mt (6.73 s.) ×4.01 slow, (27.59 s.)
3.62 billion rows/s., 28.99 GB/s.

Note:

  • ClickHouse system.numbers_mt is 16-way parallelism processing
  • FuseQuery system.numbers_mt is 16-way parallelism processing

How to Run?

Fuse-Query Server

Run from source

$ make run

12:46:15 [ INFO] Options { log_level: "debug", num_cpus: 8, mysql_handler_port: 3307 }
12:46:15 [ INFO] Fuse-Query Cloud Compute Starts...
12:46:15 [ INFO] Usage: mysql -h127.0.0.1 -P3307

or Run with docker(Recommended):

$ docker pull datafusedev/fuse-query
...

$ docker run --init --rm -p 3307:3307 datafusedev/fuse-query
05:12:36 [ INFO] Options { log_level: "debug", num_cpus: 6, mysql_handler_port: 3307 }
05:12:36 [ INFO] Fuse-Query Cloud Compute Starts...
05:12:36 [ INFO] Usage: mysql -h127.0.0.1 -P3307

or Download the release binary here:

https://github.com/datafusedev/fuse-query/releases

Query with MySQL client

Connect
$ mysql -h127.0.0.1 -P3307
Explain Plan
mysql> explain select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| explain                                                                                                                                                                                                                          |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Limit: 3
  Projection: (number + 1) as c1:UInt64, (number / 2) as c2:UInt64
    Filter: (((c1 + c2) + 1) < 100)
      ReadDataSource: scan parts [8](Read from system.numbers_mt table, Read Rows:10000000, Read Bytes:80000000) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
Explain Pipeline
mysql> explain pipeline select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| explain                                                                                                                                                                                                                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 
  └─ LimitTransform × 1 processor
    └─ Merge (LimitTransform × 8 processors) to (MergeProcessor × 1)
      └─ LimitTransform × 8 processors
        └─ ProjectionTransform × 8 processors
          └─ FilterTransform × 8 processors
            └─ SourceTransform × 8 processors                                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Select
mysql> select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+------+------+
| c1   | c2   |
+------+------+
|    1 |    0 |
|    2 |    0 |
|    3 |    1 |
+------+------+
3 rows in set (0.06 sec)

How to Test?

$ make test

Roadmap

  • 0.1 support aggregation select
  • 0.2 support distributed query (WIP)
  • 0.3 support group by, order by
  • 0.4 support join
  • 0.5 support sub queries
  • 0.6 support TPC-H benchmark

About

FuseQuery is a Cloud-Native SQL Query Engine at scale

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 95.1%
  • Python 4.4%
  • Other 0.5%