Skip to content

benchmark datasets of low-sample size narrow scaffold inhibitors

License

Notifications You must be signed in to change notification settings

bidd-group/MPCD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MPC

Benchmark datasets of molecular property cliff (MPC) in ACANet paper


Overview of the MPC benchmark datasets

1. The 9 datasets of low sample size and narrow scaffolds (LSSNS) for molecular activity prediction LSSNS

Idx dataset target target_type size reference
1 usp7 Ubiquitin carboxyl-terminal hydrolase 7 Protease 45 CHEMBL4251701
2 rip2 Serine/threonine-protein kinase RIPK2 Kinase 46 CHEMBL4266012; CHEMBL4130524
3 pkci Protein kinase C iota Kinase 48 CHEMBL4184321
4 phgdh D-3-phosphoglycerate dehydrogenase Other Enzyme 51 CHEMBL4373702
5 plk1 Serine/threonine-protein kinase PLK1 Kinase 73 CHEMBL4406868; CHEMBL4138231
6 ido1 Indoleamine 2,3-dioxygenase Other Enzyme 78 CHEMBL4364294
7 rxfp1 Relaxin receptor 1 GPCR 117 CHEMBL3714716
8 braf Serine/threonine-protein kinase B-raf Kinase 128 CHEMBL3638563
9 mglur2 Metabotropic glutamate receptor 2 GPCR 244 CHEMBL3886984

2.The 30 datasets of high sample size and mixed scaffolds (HSSMS) for molecular activity prediction. Datasets are the molecular activity prediction benchmark datasets that from MoleculeACE

Idx Dataset Code Target Type Target_type Compounds Cliffs
1 CHEMBL4792_Ki OX2R Orexin receptor 2 Ki GPCR 1471 763
2 CHEMBL4616_EC50 GHSR Ghrelin receptor EC50 GPCR 682 330
3 CHEMBL244_Ki FX Coagulation factor X Ki Protease 3097 1350
4 CHEMBL237_EC50 KOR Kappa opioid receptor EC50 GPCR 955 400
5 CHEMBL3979_EC50 PPARd Peroxisome proliferator-activated receptor delta EC50 NR 1125 467
6 CHEMBL239_EC50 PPARa Peroxisome proliferator-activated receptor alpha EC50 NR 1721 709
7 CHEMBL234_Ki D3R Dopamine D3 receptor Ki GPCR 3657 1441
8 CHEMBL2047_EC50 FXR Farnesoid X receptor EC50 NR 631 245
9 CHEMBL219_Ki D4R Dopamine D4 receptor Ki GPCR 1859 715
10 CHEMBL264_Ki HRH3 Histamine H3 receptor Ki GPCR 2862 1084
11 CHEMBL235_EC50 PPARy Peroxisome proliferator-activated receptor gamma EC50 NR 2349 881
12 CHEMBL236_Ki DOR Delta opioid receptor Ki GPCR 2598 965
13 CHEMBL4005_Ki PIK3CA PI3-kinase p110-alpha subunit Ki Transferase 960 351
14 CHEMBL218_EC50 CB1 Cannabinoid receptor 1 EC50 GPCR 1031 367
15 CHEMBL237_Ki KOR Kappa opioid receptor Ki GPCR 2602 941
16 CHEMBL204_Ki F2 Thrombin Ki Protease 2754 989
17 CHEMBL214_Ki 5-HT1A Serotonin 1a receptor Ki GPCR 3317 1147
18 CHEMBL228_Ki SERT Serotonin transporter Ki Other 1704 599
19 CHEMBL287_Ki SOR Sigma opioid receptor Ki Other 1328 464
20 CHEMBL233_Ki MOR u-opioid receptor Ki GPCR 3142 1111
21 CHEMBL2147_Ki PIM1 Serine/threonine-protein kinase PIM1 Ki Kinase 1456 485
22 CHEMBL1862_Ki ABL1 Tyrosine-protein kinase ABL1 Ki Kinase 794 253
23 CHEMBL2034_Ki GR Glucocorticoid receptor Ki NR 750 230
24 CHEMBL238_Ki DAT Dopamine transporter Ki Other 1052 263
25 CHEMBL1871_Ki AR Androgen Receptor Ki NR 659 157
26 CHEMBL231_Ki HRH1 Histamine H1 receptor Ki GPCR 973 224
27 CHEMBL262_Ki GSK3 Glycogen synthase kinase-3 beta Ki Kinase 856 158
28 CHEMBL2971_Ki JAK2 Janus kinase 2 Ki Kinase 976 120
29 CHEMBL4203_Ki CLK4 Dual specificity protein kinase CLK4 Ki Kinase 731 64
30 CHEMBL2835_Ki JAK1 Janus kinase 1 Ki Kinase 615 46

3. The 3 matched molecular pair (MMP) datasets of activity cliff classification. Datasets are from ACGCN

Dataset Total Compounds Total MMPs MMP-Cliffs MMP-nonCliffs
Thrombin 3171 5751 317 4408
Mu opioid receptor 3625 8725 219 7097
Melanocortin receptor 4 1858 7169 111 5750

4. The 10 datasets of ADMET properties in delta prediction. Datasets are from DeepDelta

Dataset Property Size Units
Free Solvation Absorption 642 Experimental Hydration Free Energy in Water
Caco2 Absorption 910 Log(Papp)
Aqueous Solubility Absorption 1128 LogS
Fraction Unbound, Brain Distribution 253 Log(fu,brain)
Volume of Distribution at Steady State Distribution 1130 Log(Body/Blood Concentration in L/kg)
Renal Clearance Excretion 636 Log(CLr)
Microsomal Clearance Metabolism 731 Log(mL/min/kg cleared)
Hepatic Clearance Metabolism 881 Log(mL/min/kg cleared)
Half-Life Metabolism 1321 Log(Half-Life in Hours)
Hemolytic Toxicity Toxicity 828 Log(HD50)