Documentation | CI Status |
---|---|
A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram based algorithms with support for multiple loss functions (notably multi-target objectives such as max likelihood methods).
Latest:
julia> Pkg.add(url="https://github.com/Evovest/EvoTrees.jl")
From General Registry:
julia> Pkg.add("EvoTrees")
Data consists of randomly generated float32. Training is performed on 200 iterations. Code to reproduce is here.
EvoTrees: v0.15.0 XGBoost: v2.3.0 Julia v1.9.1
CPU: 12 threads on AMD Ryzen 5900X GPU: NVIDIA RTX A4000
Dimensions / Algo | XGBoost Hist | EvoTrees | EvoTrees GPU |
---|---|---|---|
100K x 100 | 2.38s | 1.03s | 2.72s |
500K x 100 | 11.1s | 3.23s | 3.52s |
1M x 100 | 21.4s | 6.56s | 4.60s |
5M x 100 | 111s | 36.4s | 13.4s |
10M x 100 | 222s | 75.0s | 22.8s |
Dimensions / Algo | XGBoost Hist | EvoTrees | EvoTrees GPU |
---|---|---|---|
100K x 100 | 0.132s | 0.053s | 0.036s |
500K x 100 | 0.569s | 0.283s | 0.169s |
1M x 100 | 1.06s | 0.569s | 0.336s |
5M x 100 | 5.24s | 2.85s | 1.66s |
10M x 100 | 10.9s | 6.06s | 3.32s |
See official project page for more info.
A model configuration must first be defined, using one of the model constructor:
EvoTreeRegressor
EvoTreeClassifier
EvoTreeCount
EvoTreeMLE
Model training is performed using fit_evotree
.
It supports additional arguments to allowing to track out of sample metric and perform early stopping.
Look at the docs for more details on available hyper-parameters for each of the above constructors and other options for training.
using EvoTrees
config = EvoTreeRegressor(
loss=:linear,
nrounds=100,
max_depth=6,
nbins=32,
eta=0.1)
x_train, y_train = rand(1_000, 10), rand(1_000)
m = fit_evotree(config; x_train, y_train)
preds = m(x_train)
When using a DataFrames as input, features with elements types Real
(incl. Bool
) and Categorical
are automatically recognized as input features. Alternatively, fnames
kwarg can be used.
Categorical
features are treated accordingly by the algorithm. Ordered variables will be treated as numerical features, using ≤
split rule, while unordered variables are using ==
. Support is currently limited to a maximum of 255 levels. Bool
variables are treated as unordered, 2-levels cat variables.
dtrain = DataFrame(x_train, :auto)
dtrain.y .= y_train
m = fit_evotree(config, dtrain; target_name="y");
m = fit_evotree(config, dtrain; target_name="y", fnames=["x1", "x3"]);
Returns the normalized gain by feature.
features_gain = EvoTrees.importance(m)
Plot a given tree of the model:
plot(m, 2)
Note that 1st tree is used to set the bias so the first real tree is #2.
EvoTrees.save(m, "data/model.bson")
m = EvoTrees.load("data/model.bson");