This repository contains the validation of ZairaChem v0 using the ADMET datasets from the Therapeutics Data Commons
*This repository is using the old version of ZairaChem, if you wish to use it please clone the ZairaChem release v0.0.2. The validation of ZairaChem v1 can be found in https://github.com/ersilia-os/zaira-chem-tdc
ZairaChem is an automated pipeline for ML-based (Q)SAR models. Detailed installation instructions can be found in Ersilia's GitBook
In short, to use ZairaChem:
git clone https://github.com/ersilia-os/zaira-chem.git
cd zaira-chem
bash install_script.sh
Model training and prediction:
conda activate zairachem
zairachem fit -i <train_data.csv> -m <model_folder>
zairachem predict -i <test_data.csv> -m <model_folder> -o <pred_folder>
We have benchmarked ZairaChem in the ADMET TDC Leaderboard. At this stage we have focused only on classification tasks.
The admet_classifications notebook shows the code to reproduce the model training and evaluation. For simplicity, the automated reports and raw data of the 8-fold evaluations are provided in the /predictions folder. An example model for each dataset is also available in the /models folder.
Dataset | Metric | Score |
---|---|---|
Bioavailability_Ma | AUROC | 0.706 ± 0.031 |
HIA_Hou | AUROC | 0.948 ± 0.018 |
Pgp_Broccatelli | AUROC | 0.935 ± 0.006 |
BBB_Martins | AUROC | 0.91 ± 0.024 |
CYP2C9_Veith | AUPRC | 0.786 ± 0.004 |
CYP2D6_Veith | AUPRC | 0.644 ± 0.085 |
CYP3A4_Veith | AUPRC | 0.875 ± 0.002 |
CYP2C9_Substrate_CarbonMangels | AUPRC | 0.441 ± 0.033 |
CYP2D6_Substrate_CarbonMangels | AUPRC | 0.685 ± 0.029 |
CYP3A4_Substrate_CarbonMangels | AUPRC | 0.63 ± 0.008 |
hERG | AUROC | 0.856 ± 0.009 |
AMES | AUROC | 0.871 ± 0.002 |
DILI | AUROC | 0.925 ± 0.005 |
If you use our work, please cite us:
The Ersilia Open Source Initiative is a Non Profit Organization with the mission is to equip labs, universities and clinics in LMIC with AI/ML tools for infectious disease research.
Help us achieve our mission!