SIBILA

SIBILA Server takes advantage of HPC and ML/DL to provide users with a powerful predictive tool. Several ML models are available and a large set of configuration parameters facilitate the configuration of the tasks. In addition, the server applies the concept of explainable artificial intelligence (XAI) to present the results in a way that users will be able to understand. A collection of interpretability approaches are implemented to identify the most relevant features that were taken into consideration by the model in order to make the prediction.

Installation (choose one)

git clone https://github.com/bio-hpc/sibila.git
git clone [email protected]:bio-hpc/sibila.git
gh repo clone bio-hpc/sibila
Download the .zip and unzip it in the supercomputing centers you are going to use

Download singularity image

Needed to secure compatibility with all cluster.

cd sibila/Tools/Singularity

wget --no-check-certificate -r "https://drive.usercontent.google.com/download?id=1eVI6RpUPvmrOi6Z8p0AeA-dpefdL4UVu&confirm=t" -O sibila.sif

chmod u+x sibila.sif

Available ML/DL Models and Algorithms

DT (Decision Tree)
RF (Random Forest)
SVM (Support Vector Machines)
XGBOOST (eXtreme Gradient BOOSTing)
ANN (Artificial Neural Networks)
KNN (K-Nearest Neighbours)
RLF (RuLEFit)
RP (RIPPERk)
LR (Linear/Logistic Regression) 10.BAG (Bagging)

Available Interpretability Methods

Permutation Feature Importance
RF-based Permutation Feature Importance
Local Interpretable Model-agnostic Explanations (LIME)
Integrated Gradients
Shapley values
Diverse Counterfactual Explanations (DICE)
Partial Dependence Plots (PDP)
Accumulated Local Effects (ALE)
Anchors

Scripts

It is a directory that contains scripts for creating random datasets, running manual grid search and joining results into a single output file.

It is recommended to use these scripts with the SIBILA singularity image "Tools / Singularity / sibila.sif". For instance:

singularity exec Tools/Singularity/sibila.sif python3 Scripts/ResultAnalyzer.py -d folder_containing_results -o myfile.xlsx

CHANGELOG

v1.2.2 (in progress)

Implemented BayesianOptimizer as method for hyperparameter searaching.
Implemented downsampling option.
KNN creates a new plot to help interpretability.
Only training data is balanced when using -b option.
Plotted anchor rules with precision and coverage.
Removed error bars from global interpretability plots.
Implemented MAPE as regression metric.

v1.2.1 (04/03/2024)

Added bagging (BAG) model.
Support for multiclass classification.
Support for grid and random search with RuleFit model.
The plot of global attributions displays the 10 most attributed features for readability.
Implemented consensus via scoring functions (average mean, harmonic mean, own function...).
Corrections of ResultAnalyzer.py.

v1.2.0 (04/02/2023)

Added new parameter: --skip-dataset-analysis.
Use of environment variables in Python code.
Pass environment variables dynamically to the jobs when parallelizing interpretability.
Renamed h5 and sif files to use the standard notation.
Added new parameter: --skip-interpretability.
Always save execution status in a pickle file.
Added new parameter: -e, --explanation. Useful when explaining previously trained models.
Implemented GPU support through Singularity.
Fix on RandomOversample. Set sampling_strategy=auto.
Incorporated extra datasets.
Bind Singularity for executions from outside /home.
Intrepretability algorithms bulk the attribution of all variables into csv files.
Reworked explainers.
Added anchors and RF-based permutation importance as explainers.
Implemented RIPPERk model's grid search.
Save the probability of being classified as class X into csv files.
Extra ranges (10-step) in class probability plot.
The last column has to be removed from the dataset when using prediction mode (-m).
Allow text IDs in the first column.
Included Linear/Logistic Regression.

v1.1.0 (03/09/2022)

Grid and random search on Artificial Neural Networks.
Uploaded synthetic datasets for testing.
Parallelization of interpretability tasks.
Plot times in logarithmic scale for a better reading.
Exclusion of Keras Tuner and jobs dir from the compressed file.
Fixed cross validation with Artificial Neural Networks. Didn't work properly.
Renamed and standarized metric keys.
Added mininum number of layers in Artificial Neural Networks.
Corrected 2-unit layers and dropout layers. They were added after every hidden layer.
Added callbacks to speed up Artificial Neural Networks training.
Collect and plot loss through epochs manually.

v1.0.0 (30/06/2022) Initial version

Name		Name	Last commit message	Last commit date
Latest commit History 190 Commits
Common		Common
Datasets/Tests		Datasets/Tests
Models		Models
Scripts		Scripts
Tests		Tests
Tools		Tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
interpretability.sh		interpretability.sh
log_gpu_cpu_stats.py		log_gpu_cpu_stats.py
requirements.txt		requirements.txt
sibila.py		sibila.py
sibila.sh		sibila.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIBILA

Installation (choose one)

Download singularity image

Available ML/DL Models and Algorithms

Available Interpretability Methods

Scripts

CHANGELOG

About

Releases

Packages

Contributors 3

Languages

License

bio-hpc/sibila

Folders and files

Latest commit

History

Repository files navigation

SIBILA

Installation (choose one)

Download singularity image

Available ML/DL Models and Algorithms

Available Interpretability Methods

Scripts

CHANGELOG

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages