Skip to content

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust

License

Notifications You must be signed in to change notification settings

zhaox1n/datafuse

This branch is 23435 commits behind databendlabs/databend:main.

Folders and files

NameName
Last commit message
Last commit date
Dec 22, 2021
Feb 15, 2022
Feb 16, 2022
Feb 17, 2022
Feb 13, 2022
Feb 17, 2022
Feb 17, 2022
Feb 17, 2022
Feb 16, 2022
Feb 17, 2022
Feb 7, 2022
Feb 15, 2022
Sep 13, 2021
Feb 17, 2022
Jan 19, 2022
Dec 14, 2021
Feb 16, 2022
Feb 17, 2022
Dec 10, 2021
Sep 13, 2021
Feb 12, 2022
Feb 14, 2022
Feb 7, 2022
Jan 14, 2022

Repository files navigation

Databend Logo

The Open Source Serverless Data Warehouse for Everyone


What is Databend?

Databend aimed to be an open source elastic and reliable serverless data warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Databend design principles:

  1. Elastic In Databend, storage and compute resources can be scaled on demand.
  2. Serverless In Databend, you don’t have to think about servers, you pay only for what you actually used.
  3. User-friendly Databend is an ANSI SQL compliant cloud warehouse, it is easy for data scientist and engineers to use.
  4. Secure All data files and network traffic in Databend is encrypted end-to-end, and provide Role Based Access Control in SQL level.

Design Overview

Databend Architecture

Databend is inspired by ClickHouse and its computing model is based on apache-arrow.

Databend consists of three components: meta service layer, and the decoupled compute and storage layers.

Meta Service Layer

The meta service is a layer to service multiple tenants. In current implementation, the meta service has components:

  • Metadata - Which manages all metadata of databases, tables, clusters, the transaction, etc.
  • Administration Which stores user info, user management, access control information, usage statistics, etc.
  • Security Which performs authorization and authentication to protect the privacy of users' data.

Compute Layer

The compute layer is the clusters that running computing workloads, each cluster have many nodes, each node has components:

  • Planner - Builds execution plan from the user's SQL statement.
  • Optimizer - Optimizer rules like predicate push down or pruning of unused columns.
  • Processors - Vectorized Execution Engine, which is build by planner instructions.
  • Cache - Caching Data and Indexes based on the version.

Many clusters can attach the same database, so they can serve the query in parallel by different users.

Storage Layer

Databend stores data in an efficient, columnar format as Parquet files. For efficient pruning, Databend also creates indexes for each Parquet file to speed up the queries.

Getting Started

Roadmap

Databend is currently in Alpha and is not ready to be used in production, Roadmap 2022

License

Databend is licensed under Apache 2.0.

Acknowledgement

Document Hosting

About

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 98.0%
  • Python 0.8%
  • Shell 0.7%
  • SCSS 0.2%
  • JavaScript 0.1%
  • Makefile 0.1%
  • Other 0.1%