Skip to content

Evovest/EvoTrees.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EvoTrees

Documentation CI Status

A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram based algorithms with support for multiple loss functions (notably multi-target objectives such as max likelihood methods).

R binding available.

Installation

Latest:

julia> Pkg.add(url="https://github.com/Evovest/EvoTrees.jl")

From General Registry:

julia> Pkg.add("EvoTrees")

Performance

Data consists of randomly generated float32. Training is performed on 200 iterations. Code to reproduce is here.

EvoTrees: v0.15.0 XGBoost: v2.3.0 Julia v1.9.1

CPU: 12 threads on AMD Ryzen 5900X GPU: NVIDIA RTX A4000

Training:

Dimensions / Algo XGBoost Hist EvoTrees EvoTrees GPU
100K x 100 2.38s 1.03s 2.72s
500K x 100 11.1s 3.23s 3.52s
1M x 100 21.4s 6.56s 4.60s
5M x 100 111s 36.4s 13.4s
10M x 100 222s 75.0s 22.8s

Inference:

Dimensions / Algo XGBoost Hist EvoTrees EvoTrees GPU
100K x 100 0.132s 0.053s 0.036s
500K x 100 0.569s 0.283s 0.169s
1M x 100 1.06s 0.569s 0.336s
5M x 100 5.24s 2.85s 1.66s
10M x 100 10.9s 6.06s 3.32s

MLJ Integration

See official project page for more info.

Quick start with internal API

A model configuration must first be defined, using one of the model constructor:

  • EvoTreeRegressor
  • EvoTreeClassifier
  • EvoTreeCount
  • EvoTreeMLE

Model training is performed using fit_evotree. It supports additional arguments to allowing to track out of sample metric and perform early stopping. Look at the docs for more details on available hyper-parameters for each of the above constructors and other options for training.

Matrix features input

using EvoTrees

config = EvoTreeRegressor(
    loss=:linear, 
    nrounds=100, 
    max_depth=6,
    nbins=32,
    eta=0.1)

x_train, y_train = rand(1_000, 10), rand(1_000)
m = fit_evotree(config; x_train, y_train)
preds = m(x_train)

DataFrames input

When using a DataFrames as input, features with elements types Real (incl. Bool) and Categorical are automatically recognized as input features. Alternatively, fnames kwarg can be used.

Categorical features are treated accordingly by the algorithm. Ordered variables will be treated as numerical features, using split rule, while unordered variables are using ==. Support is currently limited to a maximum of 255 levels. Bool variables are treated as unordered, 2-levels cat variables.

dtrain = DataFrame(x_train, :auto)
dtrain.y .= y_train
m = fit_evotree(config, dtrain; target_name="y");
m = fit_evotree(config, dtrain; target_name="y", fnames=["x1", "x3"]);

Feature importance

Returns the normalized gain by feature.

features_gain = EvoTrees.importance(m)

Plot

Plot a given tree of the model:

plot(m, 2)

Note that 1st tree is used to set the bias so the first real tree is #2.

Save/Load

EvoTrees.save(m, "data/model.bson")
m = EvoTrees.load("data/model.bson");