SIBILA Server takes advantage of HPC and ML/DL to provide users with a powerful predictive tool. Several ML models are available and a large set of configuration parameters facilitate the configuration of the tasks. In addition, the server applies the concept of explainable artificial intelligence (XAI) to present the results in a way that users will be able to understand. A collection of interpretability approaches are implemented to identify the most relevant features that were taken into consideration by the model in order to make the prediction.
- git clone https://github.com/bio-hpc/sibila.git
- git clone [email protected]:bio-hpc/sibila.git
- gh repo clone bio-hpc/sibila
- Download the .zip and unzip it in the supercomputing centers you are going to use
Needed to secure compatibility with all cluster.
cd sibila/Tools/Singularity
wget --no-check-certificate -r "https://drive.usercontent.google.com/download?id=1eVI6RpUPvmrOi6Z8p0AeA-dpefdL4UVu&confirm=t" -O sibila.sif
chmod u+x sibila.sif
- DT (Decision Tree)
- RF (Random Forest)
- SVM (Support Vector Machines)
- XGBOOST (eXtreme Gradient BOOSTing)
- ANN (Artificial Neural Networks)
- KNN (K-Nearest Neighbours)
- RLF (RuLEFit)
- RP (RIPPERk)
- LR (Linear/Logistic Regression) 10.BAG (Bagging)
- Permutation Feature Importance
- RF-based Permutation Feature Importance
- Local Interpretable Model-agnostic Explanations (LIME)
- Integrated Gradients
- Shapley values
- Diverse Counterfactual Explanations (DICE)
- Partial Dependence Plots (PDP)
- Accumulated Local Effects (ALE)
- Anchors
It is a directory that contains scripts for creating random datasets, running manual grid search and joining results into a single output file.
It is recommended to use these scripts with the SIBILA singularity image "Tools / Singularity / sibila.sif". For instance:
singularity exec Tools/Singularity/sibila.sif python3 Scripts/ResultAnalyzer.py -d folder_containing_results -o myfile.xlsx
v1.2.2 (in progress)
- Implemented BayesianOptimizer as method for hyperparameter searaching.
- Implemented downsampling option.
- KNN creates a new plot to help interpretability.
- Only training data is balanced when using -b option.
- Plotted anchor rules with precision and coverage.
- Removed error bars from global interpretability plots.
- Implemented MAPE as regression metric.
v1.2.1 (04/03/2024)
- Added bagging (BAG) model.
- Support for multiclass classification.
- Support for grid and random search with RuleFit model.
- The plot of global attributions displays the 10 most attributed features for readability.
- Implemented consensus via scoring functions (average mean, harmonic mean, own function...).
- Corrections of ResultAnalyzer.py.
v1.2.0 (04/02/2023)
- Added new parameter: --skip-dataset-analysis.
- Use of environment variables in Python code.
- Pass environment variables dynamically to the jobs when parallelizing interpretability.
- Renamed h5 and sif files to use the standard notation.
- Added new parameter: --skip-interpretability.
- Always save execution status in a pickle file.
- Added new parameter: -e, --explanation. Useful when explaining previously trained models.
- Implemented GPU support through Singularity.
- Fix on RandomOversample. Set sampling_strategy=auto.
- Incorporated extra datasets.
- Bind Singularity for executions from outside /home.
- Intrepretability algorithms bulk the attribution of all variables into csv files.
- Reworked explainers.
- Added anchors and RF-based permutation importance as explainers.
- Implemented RIPPERk model's grid search.
- Save the probability of being classified as class X into csv files.
- Extra ranges (10-step) in class probability plot.
- The last column has to be removed from the dataset when using prediction mode (-m).
- Allow text IDs in the first column.
- Included Linear/Logistic Regression.
v1.1.0 (03/09/2022)
- Grid and random search on Artificial Neural Networks.
- Uploaded synthetic datasets for testing.
- Parallelization of interpretability tasks.
- Plot times in logarithmic scale for a better reading.
- Exclusion of Keras Tuner and jobs dir from the compressed file.
- Fixed cross validation with Artificial Neural Networks. Didn't work properly.
- Renamed and standarized metric keys.
- Added mininum number of layers in Artificial Neural Networks.
- Corrected 2-unit layers and dropout layers. They were added after every hidden layer.
- Added callbacks to speed up Artificial Neural Networks training.
- Collect and plot loss through epochs manually.
v1.0.0 (30/06/2022) Initial version