GitHub - Yuze-Liao/yggdrasil-decision-forests at c4c5638c828b9c5bc5bd58832841d2f242327872

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
configure		configure
documentation		documentation
examples		examples
third_party		third_party
tools		tools
yggdrasil_decision_forests		yggdrasil_decision_forests
.bazelrc		.bazelrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
WORKSPACE		WORKSPACE

Repository files navigation

Yggdrasil Decision Forests (YDF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is developed in C++ and available in C++, CLI (command-line-interface, i.e. shell commands) and in TensorFlow under the name TensorFlow Decision Forests (TF-DF).

Developing models in TF-DF and productionizing them (possibly including re-training) in C++ with YDF allows both for a flexible and fast development and an efficient and safe serving.

Usage example

Train, evaluate and benchmark the speed of a model in a few shell lines with the CLI interface:

# Training configuration
echo 'label:"my_label" learner:"RANDOM_FOREST" ' > config.pbtxt
# Scan the dataset
infer_dataspec --dataset="csv:train.csv" --output="spec.pbtxt"
# Train a model
train --dataset="csv:train.csv" --dataspec="spec.pbtxt" --config="config.pbtxt" --output="my_model"
# Evaluate the model
evaluate --dataset="csv:test.csv" --model="my_model" > evaluation.txt
# Benchmark the speed of the model
benchmark_inference --dataset="csv:test.csv" --model="my_model" > benchmark.txt

(see the examples/beginner.sh for more details)

or use the C++ interface:

auto dataset_path = "csv:/train@10";
// Training configuration
TrainingConfig train_config;
train_config.set_learner("RANDOM_FOREST");
train_config.set_task(Task::CLASSIFICATION);
train_config.set_label("my_label");
// Scan the dataset
DataSpecification spec;
CreateDataSpec(dataset_path, false, {}, &spec);
// Train a model
std::unique_ptr<AbstractLearner> learner;
GetLearner(train_config, &learner);
auto model = learner->Train(dataset_path, spec);
// Export the model
SaveModel("my_model", model.get());

(see the examples/beginner.cc for more details)

or use the Keras/Python interface of TensorFlow Decision Forests:

import tensorflow_decision_forests as tfdf
import pandas as pd
# Load the dataset in a Pandas dataframe.
train_df = pd.read_csv("project/train.csv")
# Convert the dataset into a TensorFlow dataset.
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="my_label")
# Train the model
model = tfdf.keras.RandomForestModel()
model.fit(train_ds)
# Export a SavedModel.
model.save("project/model")

(see TensorFlow Decision Forests for more details)

Documentation & Resources

The following resources are available:

Installation from pre-compiled binaries

Download one of the build releases, and then run examples/beginner.{sh,bat}.

Installation from Source

On linux, install Bazel and run:

git clone https://github.com/google/yggdrasil-decision-forests.git
cd yggdrasil_decision_forests
bazel build //yggdrasil_decision_forests/cli:all --config=linux_cpp17 --config=linux_avx2

# Then, run the example:
examples/beginner.sh

See the installation page for more details, troubleshooting and alternative installation solutions.

Yggdrasil was successfully compiled and run on:

Linux Debian 5
Windows 10
MacOS 10
Raspberry Pi 4 Rev 2

Inference of Yggdrasil models is also available on:

[Experimental; No support] Arduino Uno R3 (see project)

Note: Tell us if you were able to compile and run Yggdrasil on any other architecture :).

Long-time-support commitments

Inference and serving

The serving code is isolated from the rest of the framework (i.e., training, evaluation) and has minimal dependencies.
Changes to serving-related code are guaranteed to be backward compatible.
Model inference is deterministic: the same example is guaranteed to yield the same prediction.
Learners and models are extensively tested, including integration testing on real datasets; and, there exists no execution path in the serving code that crashes as a result of an error; Instead, in case of failure (e.g., malformed input example), the inference code returns a util::Status.

Training

Hyper-parameters' semantic is never modified.
The default value of hyper-parameters is never modified.
The default value of a newly-introduced hyper-parameter is set in such a way that the hyper-parameter is effectively disabled.

Quality Assurance

The following mechanisms will be put in place to ensure the quality of the library:

Peer-reviewing.
Unit testing.
Training benchmarks with ranges of acceptable evaluation metrics.
Sanitizers.

Contributing

Contributions to TensorFlow Decision Forests and Yggdrasil Decision Forests are welcome. If you want to contribute, make sure to review the user manual, developer manual and contribution guidelines.

Credits

TensorFlow Decision Forests was developed by:

Mathieu Guillame-Bert (gbm AT google DOT com)
Jan Pfeifer (janpf AT google DOT com)
Sebastian Bruch (sebastian AT bruch DOT io)
Arvind Srinivasan (arvnd AT google DOT com)

License

Apache License 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage example

Documentation & Resources

Installation from pre-compiled binaries

Installation from Source

Long-time-support commitments

Inference and serving

Training

Quality Assurance

Contributing

Credits

License

About

Releases

Packages

Languages

License

Yuze-Liao/yggdrasil-decision-forests

Folders and files

Latest commit

History

Repository files navigation

Usage example

Documentation & Resources

Installation from pre-compiled binaries

Installation from Source

Long-time-support commitments

Inference and serving

Training

Quality Assurance

Contributing

Credits

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages