Skip to content

Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)

License

Notifications You must be signed in to change notification settings

zhang-x-z/fluid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License CircleCI Build Status codecov Go Report Card Artifact HUB FOSSA Status CII Best Practices

What is Fluid?

Fluid is an open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for data-intensive applications, such as big data and AI applications. It is hosted by the Cloud Native Computing Foundation (CNCF) as a sandbox project.

Fluid

English | 简体中文

notification What is NEW!
April. 27th, 2021. Fluid accpeted by CNCF! Fluid project was accepted as an official CNCF Sandbox Project by CNCF Technical Oversight Committee (TOC) with a majority vote after the review process. New beginning for Fluid! .
Mar. 16th, 2021. Fluid v0.5.0 is RELEASED! It provides various new features, such as on-the-fly dataset scale out/in, metadata backup, support Fuse global mode and so on. Please check the CHANGELOG for details.
Nov. 6th, 2020. Fluid v0.4.0 is RELEASED! It provides various features and bugfix, such as Prefetch Dataset automatically before using it. Please check the CHANGELOG for details.
Oct. 1st, 2020. Fluid v0.3.0 is RELEASED! It provides various features and bugfix, such as Data Access Acceleration For Persistent Volume and Hostpath mode in K8s. Please check the CHANGELOG for details.

Features

  • Native Support for DataSet Abstraction

    Implement the basic capabilities required for data-intensive applications to achieve efficient data access and reduce the cost of multidimensional management.

  • Cloud Data Warming up and Accessing Acceleration

    Fluid provides data warm-up and acceleration for cloud applications by using a distributed cache engine (Alluxio) in Kubernetes with Observability, Portability and Horizontal Scalability

  • Co-Orchestration for Data and Application

    During application scheduling and data placement on the cloud, taking both the app's characteristics and data location into consideration, to improve the performance.

  • Support Multiple Namespaces Management

    User can create and manage datasets in multiple namespaces.

  • Support Heterogeneous Data Source Management

    Unify the Data access for OSS, HDFS, CEPH and Other underlayer storages.

Key Concepts

Dataset: A DataSet is a set of data logically related that can be used by computing engines, such as Spark for big data analytics and TensorFlow for AI applications. Intelligently leveraging data often creates core industry values. Managing DataSets may require features in different dimensions, such as security, version management and data acceleration. We hope to start with data acceleration to support the management of datasets.

Runtime: The execution engine that enforces dataset security, provides version management and data acceleration capabilities. The Runtime defines a set of interfaces to mangage DataSets in their life cycle, so the management and acceleration of datasets can be implemented behind these interfaces.

AlluxioRuntime: Based on open-source Alluixo, Fluid can manage and schedule Alluxio Runtime to achieve dataset visibility, elastic scaling, and data migration. This is one engine which supports data management and caching of Datasets.

Prerequisites

  • Kubernetes version > 1.14, and support CSI
  • Golang 1.12+
  • Helm 3

Quick Start

You can follow our Get Started guide to quickly start a testing Kubernetes cluster.

Documentation

You can see our documentation at docs for more in-depth installation and instructions for production:

Quick Demo

Demo 1: Accelerate Remote File Accessing with Fluid

Demo 2: Machine Learning with Fluid

Demo 3: Accelerate PVC with Fluid

Demo 4: Preload dataset with Fluid

Demo 5: On-the-fly dataset cache scaling

Roadmap

See ROADMAP.md for the roadmap details. It may be updated from time to time.

Community

Feel free to reach out if you have any questions. The maintainers of this project are reachable via:

DingTalk:

Contributing

Contributions are highly welcomed and greatly appreciated. See CONTRIBUTING.md for details on submitting patches and the contribution workflow.

Adopters

If you are intrested in Fluid and would like to share your experiences with others, you are warmly welcome to add your information on ADOPTERS.md page. We will continuousely discuss new requirements and feature design with you in advance.

Open Source License

Fluid is under the Apache 2.0 license. See the LICENSE file for details. It is vendor-neutral.

Code of Conduct

Fluid adopts CNCF Code of Conduct.

About

Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 95.3%
  • Python 2.3%
  • Mustache 1.1%
  • Shell 1.0%
  • Makefile 0.2%
  • Smarty 0.1%