Skip to content

Commit

Permalink
Added HMM, NGram and Combinatorial baselines (molecularsets#77)
Browse files Browse the repository at this point in the history
  • Loading branch information
danpol authored Mar 26, 2020
1 parent b94468c commit 8bdfd14
Show file tree
Hide file tree
Showing 83 changed files with 1,056 additions and 720,612 deletions.
49 changes: 49 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,52 @@ moses/dataset/data/test_stats.npz filter=lfs diff=lfs merge=lfs -text
moses/dataset/data/test.csv.gz filter=lfs diff=lfs merge=lfs -text
moses/dataset/data/test_scaffolds.csv.gz filter=lfs diff=lfs merge=lfs -text
moses/dataset/data/train.csv.gz filter=lfs diff=lfs merge=lfs -text
data/samples/combinatorial/combinatorial_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/combinatorial/metrics_combinatorial_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/hmm/hmm_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/hmm/metrics_hmm_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/latent_gan/latent_gan_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/aae/aae_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/aae/metrics_aae_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/char_rnn/char_rnn_all.csv filter=lfs diff=lfs merge=lfs -text
data/samples/ngram/metrics_ngram_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/vae/metrics_vae_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/vae/vae_all.csv filter=lfs diff=lfs merge=lfs -text
data/samples/ngram/ngram_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/aae/aae_all.csv filter=lfs diff=lfs merge=lfs -text
data/samples/aae/metrics_aae_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/ngram/metrics_ngram_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/latent_gan/latent_gan_all.csv filter=lfs diff=lfs merge=lfs -text
data/samples/vae/metrics_vae_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/latent_gan/metrics_latent_gan_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/ngram/ngram_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/vae/vae_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/vae/vae_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/char_rnn/char_rnn_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/combinatorial/combinatorial_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/hmm/hmm_all.csv filter=lfs diff=lfs merge=lfs -text
data/samples/ngram/metrics_ngram_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/ngram/ngram_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/char_rnn/metrics_char_rnn_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/hmm/hmm_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/latent_gan/metrics_latent_gan_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/combinatorial/combinatorial_all.csv filter=lfs diff=lfs merge=lfs -text
data/samples/char_rnn/metrics_char_rnn_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/combinatorial/combinatorial_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/combinatorial/metrics_combinatorial_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/hmm/hmm_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/latent_gan/latent_gan_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/aae/aae_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/aae/metrics_aae_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/char_rnn/char_rnn_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/latent_gan/latent_gan_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/latent_gan/metrics_latent_gan_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/ngram/ngram_all.csv filter=lfs diff=lfs merge=lfs -text
data/samples/combinatorial/metrics_combinatorial_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/hmm/metrics_hmm_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/hmm/metrics_hmm_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/vae/metrics_vae_1.csv filter=lfs diff=lfs merge=lfs -text
data/samples/vae/vae_3.csv filter=lfs diff=lfs merge=lfs -text
data/samples/aae/aae_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/char_rnn/char_rnn_2.csv filter=lfs diff=lfs merge=lfs -text
data/samples/char_rnn/metrics_char_rnn_3.csv filter=lfs diff=lfs merge=lfs -text
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,5 @@ install:

script:
- flake8
- pylint --disable=all --enable=no-else-return,unused-variable,wrong-import-order moses/ scripts/
- python tests/test_metrics.py
- pylint --disable=all --enable=no-else-return,unused-variable,wrong-import-order moses/ scripts/ tests/
- python -m unittest discover -p "test_*.py" -v
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ RUN conda install -yq numpy=1.16.0 scipy=1.2.0 matplotlib=3.0.1 \
&& conda install -yq -c rdkit rdkit=2019.09.3 \
&& conda install -yq -c pytorch pytorch=1.1.0 torchvision=0.2.1 \
&& conda clean -yq -a \
&& pip install tensorflow-gpu==1.14
&& pip install tensorflow-gpu==1.14 pomegranade==0.12.0

RUN git clone https://github.com/pcko1/Deep-Drug-Coder.git --branch moses \
&& cd Deep-Drug-Coder \
Expand Down
228 changes: 140 additions & 88 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![Build Status](https://travis-ci.com/molecularsets/moses.svg?branch=master)](https://travis-ci.com/molecularsets/moses) [![PyPI version](https://badge.fury.io/py/molsets.svg)](https://badge.fury.io/py/molsets)

Deep generative models such as generative adversarial networks, variational autoencoders, and autoregressive models are rapidly growing in popularity for the discovery of new molecules and materials. In this work, we introduce MOlecular SEtS (MOSES), a benchmarking platform to support research on machine learning for drug discovery. MOSES implements several popular molecular generation models and includes a set of metrics that evaluate the diversity and quality of generated molecules. MOSES is meant to standardize the research on molecular generation and facilitate the sharing and comparison of new models. Additionally, we provide a large-scale comparison of existing state of the art models and elaborate on current challenges for generative models that might prove fertile ground for new research. Our platform and source code are freely available here.
Deep generative models are rapidly becoming popular for the discovery of new molecules and materials. Such models learn on a large collection of molecular structures and produce novel compounds. In this work, we introduce Molecular Sets (MOSES), a benchmarking platform to support research on machine learning for drug discovery. MOSES implements several popular molecular generation models and provides a set of metrics to evaluate the quality and diversity of generated molecules. With MOSES, we aim to standardize the research on molecular generation and facilitate the sharing and comparison of new models.

__For more details, please refer to the [paper](https://arxiv.org/abs/1811.12823).__

Expand Down Expand Up @@ -72,121 +72,173 @@ Besides standard uniqueness and validity metrics, MOSES provides other metrics t
<td><i>1.0</i></td>
<td><i>1.0</i></td>
<td><i>0.008</i></td>
<td><i>0.476</i></td>
<td><i>0.642</i></td>
<td><i>0.586</i></td>
<td><i>0.4755</i></td>
<td><i>0.6419</i></td>
<td><i>0.5859</i></td>
<td><i>1.0</i></td>
<td><i>0.999</i></td>
<td><i>0.991</i></td>
<td><i>0.9986</i></td>
<td><i>0.9907</i></td>
<td><i>0.0</i></td>
<td><i>0.857</i></td>
<td><i>0.851</i></td>
<td><i>0.8567</i></td>
<td><i>0.8508</i></td>
<td><i>1.0</i></td>
<td><i>1.0</i></td>
</tr>
<tr>
<td>AAE</td>
<td>0.937±0.034</td>
<td><b>1.0±0.0</b></td>
<td>0.997±0.002</td>
<td>0.556±0.203</td>
<td>1.057±0.237</td>
<td>0.608±0.004</td>
<td>0.568±0.005</td>
<td>0.991±0.005</td>
<td>0.99±0.004</td>
<td>0.902±0.037</td>
<td>0.079±0.009</td>
<td>0.856±0.003</td>
<td><b>0.85±0.003</b></td>
<td>0.996±0.001</td>
<td>0.793±0.028</td>
<td>HMM</td>
<td>0.076±0.0322</td>
<td>0.623±0.1224</td>
<td>0.5671±0.1424</td>
<td>24.4661±2.5251</td>
<td>25.4312±2.5599</td>
<td>0.3876±0.0107</td>
<td>0.3795±0.0107</td>
<td>0.5754±0.1224</td>
<td>0.5681±0.1218</td>
<td>0.2065±0.0481</td>
<td>0.049±0.018</td>
<td>0.8466±0.0403</td>
<td>0.8104±0.0507</td>
<td>0.9024±0.0489</td>
<td><b>0.9994±0.001</b></td>
</tr>
<tr>
<td>NGram</td>
<td>0.2376±0.0025</td>
<td>0.974±0.0108</td>
<td>0.9217±0.0019</td>
<td>5.5069±0.1027</td>
<td>6.2306±0.0966</td>
<td>0.5209±0.001</td>
<td>0.4997±0.0005</td>
<td>0.9846±0.0012</td>
<td>0.9815±0.0012</td>
<td>0.5302±0.0163</td>
<td>0.0977±0.0142</td>
<td>0.8738±0.0002</td>
<td>0.8644±0.0002</td>
<td>0.9582±0.001</td>
<td>0.9694±0.001</td>
</tr>
<tr>
<td>Combinatorial</td>
<td>0.9979±0.0003</td>
<td>0.9983±0.0006</td>
<td>0.9948±0.0005</td>
<td>6.1626±0.0081</td>
<td>6.7734±0.0106</td>
<td>0.4226±0.0004</td>
<td>0.4079±0.0004</td>
<td>0.9151±0.0026</td>
<td>0.9099±0.0026</td>
<td>0.307±0.0187</td>
<td>0.0928±0.0079</td>
<td><b>0.8812±0.0003</b></td>
<td><b>0.8741±0.0003</b></td>
<td>0.7912±0.0021</td>
<td>0.9913±0.0004</td>
</tr>
<tr>
<td>CharRNN</td>
<td>0.975±0.026</td>
<td><b>1.0±0.0</b></td>
<td><b>0.999±0.0</b></td>
<td><b>0.073±0.025</b></td>
<td><b>0.52±0.038</b></td>
<td>0.601±0.021</td>
<td>0.565±0.014</td>
<td>0.9748±0.0264</td>
<td><b>1.0±0.0</b></td>
<td><b>0.998±0.0</b></td>
<td>0.924±0.006</td>
<td><b>0.11±0.008</b></td>
<td>0.856±0.0</td>
<td><b>0.85±0.0</b></td>
<td>0.994±0.003</td>
<td>0.842±0.051</td>
<td><b>0.9994±0.0003</b></td>
<td><b>0.0732±0.0247</b></td>
<td><b>0.5204±0.0379</b></td>
<td>0.6015±0.0206</td>
<td>0.5649±0.0142</td>
<td><b>0.9998±0.0002</b></td>
<td>0.9983±0.0003</td>
<td>0.9242±0.0058</td>
<td><b>0.1101±0.0081</b></td>
<td>0.8562±0.0005</td>
<td>0.8503±0.0005</td>
<td>0.9943±0.0034</td>
<td>0.8419±0.0509</td>
</tr>
<tr>
<td>JTN-VAE</td>
<td><b>1.0</b></td>
<td><b>1.0</b></td>
<td><b>0.999</b></td>
<td>0.422</td>
<td>0.996</td>
<td>0.556</td>
<td>0.527</td>
<td>0.996</td>
<td>0.995</td>
<td>0.892</td>
<td>0.1</td>
<td>0.851</td>
<td>0.845</td>
<td>0.978</td>
<td>0.915</td>
<td>AAE</td>
<td>0.9368±0.0341</td>
<td><b>1.0±0.0</b></td>
<td>0.9973±0.002</td>
<td>0.5555±0.2033</td>
<td>1.0572±0.2375</td>
<td>0.6081±0.0043</td>
<td>0.5677±0.0045</td>
<td>0.991±0.0051</td>
<td>0.9905±0.0039</td>
<td>0.9022±0.0375</td>
<td>0.0789±0.009</td>
<td>0.8557±0.0031</td>
<td>0.8499±0.003</td>
<td>0.996±0.0006</td>
<td>0.7931±0.0285</td>
</tr>
<tr>
<td>VAE</td>
<td>0.977±0.001</td>
<td>0.9767±0.0012</td>
<td><b>1.0±0.0</b></td>
<td>0.998±0.001</td>
<td>0.099±0.013</td>
<td>0.567±0.034</td>
<td><b>0.626±0.0</b></td>
<td><b>0.578±0.001</b></td>
<td>0.999±0.0</td>
<td><b>0.998±0.0</b></td>
<td><b>0.939±0.002</b></td>
<td>0.059±0.01</td>
<td>0.856±0.0</td>
<td><b>0.85±0.0</b></td>
<td><b>0.997±0.0</b></td>
<td>0.695±0.007</td>
<td>0.9984±0.0005</td>
<td>0.099±0.0125</td>
<td>0.567±0.0338</td>
<td><b>0.6257±0.0005</b></td>
<td><b>0.5783±0.0008</b></td>
<td>0.9994±0.0001</td>
<td><b>0.9984±0.0003</b></td>
<td><b>0.9386±0.0021</b></td>
<td>0.0588±0.0095</td>
<td>0.8558±0.0004</td>
<td>0.8498±0.0004</td>
<td><b>0.997±0.0002</b></td>
<td>0.6949±0.0069</td>
</tr>
<tr>
<td>JTN-VAE</td>
<td><b>1.0</b></td>
<td><b>1.0</b></td>
<td>0.9992</td>
<td>0.4224</td>
<td>0.9962</td>
<td>0.5561</td>
<td>0.5273</td>
<td>0.9962</td>
<td>0.9948</td>
<td>0.8925</td>
<td>0.1005</td>
<td>0.8512</td>
<td>0.8453</td>
<td>0.9778</td>
<td>0.9153</td>
</tr>
<tr>
<td>LatentGAN</td>
<td>0.897±0.002</td>
<td>0.897±0.0024</td>
<td><b>1.0±0.0</b></td>
<td>0.997±0.005</td>
<td>0.296±0.021</td>
<td>0.824±0.030</td>
<td>0.538±0.001</td>
<td>0.514±0.009</td>
<td>0.999±0.003</td>
<td><b>0.998±0.003</b></td>
<td>0.886±0.006</td>
<td>0.1±0.015</td>
<td><b>0.857±0.0</b></td>
<td><b>0.85±0.0</b></td>
<td>0.973±0.001</td>
<td><b>0.949±0.001</b></td>
<td>0.997±0.0005</td>
<td>0.296±0.0214</td>
<td>0.8237±0.0295</td>
<td>0.5377±0.0013</td>
<td>0.5135±0.0009</td>
<td>0.9987±0.0003</td>
<td>0.9974±0.0003</td>
<td>0.8864±0.006</td>
<td>0.1004±0.0152</td>
<td>0.8565±0.0008</td>
<td>0.8504±0.0007</td>
<td>0.9727±0.001</td>
<td>0.9488±0.0014</td>
</tr>

</tbody>
</table>

For comparison of molecular properties, we computed the Frèchet distance between distributions of molecules in the generated and test sets. Below, we provide plots for lipophilicity (logP), Synthetic Accessibility (SA), Quantitative Estimation of Drug-likeness (QED), Natural Product-likeness (NP) and molecular weight.

For comparison of molecular properties, we computed the Frèchet distance between distributions of molecules in the generated and test sets. Below, we provide plots for lipophilicity (logP), Synthetic Accessibility (SA), Quantitative Estimation of Drug-likeness (QED) and molecular weight.

|logP|SA|
|----|--|
|![logP](images/logP.png)|![SA](images/SA.png)|
|NP|QED|
|![NP](images/NP.png)|![QED](images/QED.png)|
|weight|
|![weight](images/weight.png)|
|weight|QED|
|![weight](images/weight.png)|![QED](images/QED.png)|

# Installation

Expand Down
Loading

0 comments on commit 8bdfd14

Please sign in to comment.