ChubaoFS (储宝文件系统 in Chinese) is a distributed file system and object storage service for cloud native applications. It is hosted by the Cloud Native Computing Foundation (CNCF) as a sandbox project.
ChubaoFS has been commonly used as the underlying storage infrastructure for online applications, database or data processing services and machine learning jobs orchestrated by Kubernetes. An advantage of doing so is to separate storage from compute - one can scale up or down based on the workload and independent of the other, providing total flexibility in matching resources to the actual storage and compute capacity required at any given time.
Some key features of ChubaoFS include:
-
Scale-out metadata management
-
Strong replication consistency
-
Specific performance optimizations for large/small files and sequential/random writes
-
Multi-tenancy
-
POSIX-compatible and mountable
-
S3-compatible object storage interface
We are committed to making ChubaoFS better and more mature. Please stay tuned.
https://chubaofs.readthedocs.io/en/latest/
https://chubaofs.readthedocs.io/zh_CN/latest/
Small file operation performance and scalability benchmark test by mdtest.
File Size (KB) | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 |
---|---|---|---|---|---|---|---|---|
Creation (TPS) | 70383 | 70383 | 73738 | 74617 | 69479 | 67435 | 47540 | 27147 |
Read (TPS) | 108600 | 118193 | 118346 | 122975 | 116374 | 110795 | 90462 | 62082 |
Removal (TPS) | 87648 | 84651 | 83532 | 79279 | 85498 | 86523 | 80946 | 84441 |
Stat (TPS) | 231961 | 263270 | 264207 | 252309 | 240244 | 244906 | 273576 | 242930 |
Refer to chubaofs.readthedocs.io for performance and scalability of IO
and Metadata
.
$ git clone http://github.com/chubaofs/chubaofs.git
$ cd chubaofs
$ make
The list of RPM packages dependencies can be installed with:
$ yum install http://storage.jd.com/chubaofsrpm/latest/cfs-install-latest-el7.x86_64.rpm
$ cd /cfs/install
$ tree -L 2
.
├── install_cfs.yml
├── install.sh
├── iplist
├── src
└── template
├── client.json.j2
├── create_vol.sh.j2
├── datanode.json.j2
├── grafana
├── master.json.j2
└── metanode.json.j2
Set parameters of the ChubaoFS cluster in iplist
.
-
Set IP addresses in
[master]
,[datanode]
,[metanode]
,[monitor]
,[client]
field; -
Set
datanode_disks
in#datanode config
field. Make sure the path exists on each DataNode and has at least 30GB of space. -
Unify the username and password of each node, and set the username and password in
[cfs:vars]
field.
[master]
10.196.0.1
10.196.0.2
10.196.0.3
[datanode]
...
[cfs:vars]
ansible_ssh_port=22
ansible_ssh_user=root
ansible_ssh_pass="uu"
...
#datanode config
...
datanode_disks = '"/data0:10737418240","/data1:10737418240"'
...
Start the resources of ChubaoFS cluster with script install.sh
. (make sure the Master is started first)
$ bash install.sh -h
Usage: install.sh [-r --role datanode or metanode or master or monitor or client or all ] [-v --version 1.5.1 or latest]
$ bash install.sh -r master
$ bash install.sh -r metanode
$ bash install.sh -r datanode
$ bash install.sh -r monitor
$ bash install.sh -r client
Check mount point at /cfs/mountpoint
on client
node defined in iplist
.
Open http://10.196.0.1:8500 through a browser for monitoring system(the IP of monitoring system is defined in iplist
).
A helper tool called run_docker.sh
(under the docker
directory) has been provided to run ChubaoFS with docker-compose.
$ docker/run_docker.sh -r -d /data/disk
Note that /data/disk can be any directory but please make sure it has at least 10G available space.
To check the mount status, use the mount
command in the client docker shell:
$ mount | grep chubaofs
To view grafana monitor metrics, open http://127.0.0.1:3000 in browser and login with admin/123456
.
To run server and client separately, use the following commands:
$ docker/run_docker.sh -b
$ docker/run_docker.sh -s -d /data/disk
$ docker/run_docker.sh -c
$ docker/run_docker.sh -m
For more usage:
$ docker/run_docker.sh -h
The chubaofs-helm repository can help you deploy ChubaoFS cluster quickly in containers orchestrated by kubernetes. Kubernetes 1.16+ and Helm 3 are required.
Initialize Helm:
$ helm init
Add repository to download Helm chart of ChubaoFS:
$ helm repo add chubaofs https://chubaofs.github.io/chubaofs-charts
$ helm repo update
Create the configuration file chubaofs.yaml
and put it in a user-defined path.
$ cat ~/chubaofs.yaml
path:
data: /chubaofs/data
log: /chubaofs/log
datanode:
disks:
- disk: "/data0:21474836480"
- disk: "/data1:21474836480"
metanode:
total_mem: "2147483648"
Note that
chubaofs-helm/chubaofs/values.yaml
includes all the parameters of ChubaoFS. Parameterspath.data
andpath.log
define paths to store the data and logs of ChubaoFS server, respectively.
Label the nodes in Kubernetes according to different roles(master/metanode/datanode):
$ kubectl label node <nodename> chuabaofs-master=enabled
$ kubectl label node <nodename> chuabaofs-metanode=enabled
$ kubectl label node <nodename> chuabaofs-datanode=enabled
Install ChubaoFS cluster with Helm:
$ helm install chubaofs chubaofs/chubaofs --version 1.5.0 -f ~/chubaofs.yaml
Delete ChubaoFS cluster:
$ helm delete chubaofs
Refer to chubaofs-helm for deployment with Helm 2 and deployment of monitoring system.
ChubaoFS is licensed under the Apache License, Version 2.0. For detail see LICENSE and NOTICE.
Haifeng Liu, et al., CFS: A Distributed File System for Large Scale Container Platforms. SIGMOD‘19, June 30-July 5, 2019, Amsterdam, Netherlands.
For more information, please refer to https://dl.acm.org/citation.cfm?doid=3299869.3314046 and https://arxiv.org/abs/1911.03001
- Twitter: @ChubaoFS
- Mailing list: [email protected]
- Slack: chubaofs.slack.com