Skip to content
This repository has been archived by the owner on Dec 8, 2023. It is now read-only.

This repository contains the ZairaChem models built on the ADMET datasets from Therapeutics Data Commons

License

Notifications You must be signed in to change notification settings

ersilia-os/zaira-chem-tdc-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

zaira-chem-tdc-benchmark

This repository contains the validation of ZairaChem v0 using the ADMET datasets from the Therapeutics Data Commons

*This repository is using the old version of ZairaChem, if you wish to use it please clone the ZairaChem release v0.0.2. The validation of ZairaChem v1 can be found in https://github.com/ersilia-os/zaira-chem-tdc

ZairaChem

ZairaChem is an automated pipeline for ML-based (Q)SAR models. Detailed installation instructions can be found in Ersilia's GitBook

In short, to use ZairaChem:

git clone https://github.com/ersilia-os/zaira-chem.git
cd zaira-chem
bash install_script.sh

Model training and prediction:

conda activate zairachem
zairachem fit -i <train_data.csv> -m <model_folder>
zairachem predict -i <test_data.csv> -m <model_folder> -o <pred_folder>

Classification tasks

We have benchmarked ZairaChem in the ADMET TDC Leaderboard. At this stage we have focused only on classification tasks.

The admet_classifications notebook shows the code to reproduce the model training and evaluation. For simplicity, the automated reports and raw data of the 8-fold evaluations are provided in the /predictions folder. An example model for each dataset is also available in the /models folder.

Results

Dataset Metric Score
Bioavailability_Ma AUROC 0.706 ± 0.031
HIA_Hou AUROC 0.948 ± 0.018
Pgp_Broccatelli AUROC 0.935 ± 0.006
BBB_Martins AUROC 0.91 ± 0.024
CYP2C9_Veith AUPRC 0.786 ± 0.004
CYP2D6_Veith AUPRC 0.644 ± 0.085
CYP3A4_Veith AUPRC 0.875 ± 0.002
CYP2C9_Substrate_CarbonMangels AUPRC 0.441 ± 0.033
CYP2D6_Substrate_CarbonMangels AUPRC 0.685 ± 0.029
CYP3A4_Substrate_CarbonMangels AUPRC 0.63 ± 0.008
hERG AUROC 0.856 ± 0.009
AMES AUROC 0.871 ± 0.002
DILI AUROC 0.925 ± 0.005

Cite us

If you use our work, please cite us:

ZairaChem Software

Preprint

About us

The Ersilia Open Source Initiative is a Non Profit Organization with the mission is to equip labs, universities and clinics in LMIC with AI/ML tools for infectious disease research.

Help us achieve our mission!

About

This repository contains the ZairaChem models built on the ADMET datasets from Therapeutics Data Commons

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published