Benchmark datasets of molecular property cliff (MPC) in ACANet paper
1. The 9 datasets of low sample size and narrow scaffolds (LSSNS) for molecular activity prediction LSSNS
Idx | dataset | target | target_type | size | reference |
---|---|---|---|---|---|
1 | usp7 | Ubiquitin carboxyl-terminal hydrolase 7 | Protease | 45 | CHEMBL4251701 |
2 | rip2 | Serine/threonine-protein kinase RIPK2 | Kinase | 46 | CHEMBL4266012; CHEMBL4130524 |
3 | pkci | Protein kinase C iota | Kinase | 48 | CHEMBL4184321 |
4 | phgdh | D-3-phosphoglycerate dehydrogenase | Other Enzyme | 51 | CHEMBL4373702 |
5 | plk1 | Serine/threonine-protein kinase PLK1 | Kinase | 73 | CHEMBL4406868; CHEMBL4138231 |
6 | ido1 | Indoleamine 2,3-dioxygenase | Other Enzyme | 78 | CHEMBL4364294 |
7 | rxfp1 | Relaxin receptor 1 | GPCR | 117 | CHEMBL3714716 |
8 | braf | Serine/threonine-protein kinase B-raf | Kinase | 128 | CHEMBL3638563 |
9 | mglur2 | Metabotropic glutamate receptor 2 | GPCR | 244 | CHEMBL3886984 |
2.The 30 datasets of high sample size and mixed scaffolds (HSSMS) for molecular activity prediction. Datasets are the molecular activity prediction benchmark datasets that from MoleculeACE
Idx | Dataset | Code | Target | Type | Target_type | Compounds | Cliffs |
---|---|---|---|---|---|---|---|
1 | CHEMBL4792_Ki | OX2R | Orexin receptor 2 | Ki | GPCR | 1471 | 763 |
2 | CHEMBL4616_EC50 | GHSR | Ghrelin receptor | EC50 | GPCR | 682 | 330 |
3 | CHEMBL244_Ki | FX | Coagulation factor X | Ki | Protease | 3097 | 1350 |
4 | CHEMBL237_EC50 | KOR | Kappa opioid receptor | EC50 | GPCR | 955 | 400 |
5 | CHEMBL3979_EC50 | PPARd | Peroxisome proliferator-activated receptor delta | EC50 | NR | 1125 | 467 |
6 | CHEMBL239_EC50 | PPARa | Peroxisome proliferator-activated receptor alpha | EC50 | NR | 1721 | 709 |
7 | CHEMBL234_Ki | D3R | Dopamine D3 receptor | Ki | GPCR | 3657 | 1441 |
8 | CHEMBL2047_EC50 | FXR | Farnesoid X receptor | EC50 | NR | 631 | 245 |
9 | CHEMBL219_Ki | D4R | Dopamine D4 receptor | Ki | GPCR | 1859 | 715 |
10 | CHEMBL264_Ki | HRH3 | Histamine H3 receptor | Ki | GPCR | 2862 | 1084 |
11 | CHEMBL235_EC50 | PPARy | Peroxisome proliferator-activated receptor gamma | EC50 | NR | 2349 | 881 |
12 | CHEMBL236_Ki | DOR | Delta opioid receptor | Ki | GPCR | 2598 | 965 |
13 | CHEMBL4005_Ki | PIK3CA | PI3-kinase p110-alpha subunit | Ki | Transferase | 960 | 351 |
14 | CHEMBL218_EC50 | CB1 | Cannabinoid receptor 1 | EC50 | GPCR | 1031 | 367 |
15 | CHEMBL237_Ki | KOR | Kappa opioid receptor | Ki | GPCR | 2602 | 941 |
16 | CHEMBL204_Ki | F2 | Thrombin | Ki | Protease | 2754 | 989 |
17 | CHEMBL214_Ki | 5-HT1A | Serotonin 1a receptor | Ki | GPCR | 3317 | 1147 |
18 | CHEMBL228_Ki | SERT | Serotonin transporter | Ki | Other | 1704 | 599 |
19 | CHEMBL287_Ki | SOR | Sigma opioid receptor | Ki | Other | 1328 | 464 |
20 | CHEMBL233_Ki | MOR | u-opioid receptor | Ki | GPCR | 3142 | 1111 |
21 | CHEMBL2147_Ki | PIM1 | Serine/threonine-protein kinase PIM1 | Ki | Kinase | 1456 | 485 |
22 | CHEMBL1862_Ki | ABL1 | Tyrosine-protein kinase ABL1 | Ki | Kinase | 794 | 253 |
23 | CHEMBL2034_Ki | GR | Glucocorticoid receptor | Ki | NR | 750 | 230 |
24 | CHEMBL238_Ki | DAT | Dopamine transporter | Ki | Other | 1052 | 263 |
25 | CHEMBL1871_Ki | AR | Androgen Receptor | Ki | NR | 659 | 157 |
26 | CHEMBL231_Ki | HRH1 | Histamine H1 receptor | Ki | GPCR | 973 | 224 |
27 | CHEMBL262_Ki | GSK3 | Glycogen synthase kinase-3 beta | Ki | Kinase | 856 | 158 |
28 | CHEMBL2971_Ki | JAK2 | Janus kinase 2 | Ki | Kinase | 976 | 120 |
29 | CHEMBL4203_Ki | CLK4 | Dual specificity protein kinase CLK4 | Ki | Kinase | 731 | 64 |
30 | CHEMBL2835_Ki | JAK1 | Janus kinase 1 | Ki | Kinase | 615 | 46 |
3. The 3 matched molecular pair (MMP) datasets of activity cliff classification. Datasets are from ACGCN
Dataset | Total Compounds | Total MMPs | MMP-Cliffs | MMP-nonCliffs |
---|---|---|---|---|
Thrombin | 3171 | 5751 | 317 | 4408 |
Mu opioid receptor | 3625 | 8725 | 219 | 7097 |
Melanocortin receptor 4 | 1858 | 7169 | 111 | 5750 |
4. The 10 datasets of ADMET properties in delta prediction. Datasets are from DeepDelta
Dataset | Property | Size | Units |
---|---|---|---|
Free Solvation | Absorption | 642 | Experimental Hydration Free Energy in Water |
Caco2 | Absorption | 910 | Log(Papp) |
Aqueous Solubility | Absorption | 1128 | LogS |
Fraction Unbound, Brain | Distribution | 253 | Log(fu,brain) |
Volume of Distribution at Steady State | Distribution | 1130 | Log(Body/Blood Concentration in L/kg) |
Renal Clearance | Excretion | 636 | Log(CLr) |
Microsomal Clearance | Metabolism | 731 | Log(mL/min/kg cleared) |
Hepatic Clearance | Metabolism | 881 | Log(mL/min/kg cleared) |
Half-Life | Metabolism | 1321 | Log(Half-Life in Hours) |
Hemolytic Toxicity | Toxicity | 828 | Log(HD50) |