NewMoses

Dependency

torch <= 1.13

Environment setup

create and activate conda environment named moses with python=3.8

conda create -n moses python=3.8 -y \
conda activate moses \
pip install -r requirements.txt \
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

if you use MacBook (or DO NOT USE cuda), use this code

conda create -n moses python=3.8 -y \
conda activate moses \
pip install -r requirements.txt \
pip install torch==1.12.0+cpu torchvision==0.13.0+cpu torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu

Running the benchmark data and model

Only train one model on each process b/c of wandb tracking
run the below code for training the model
You can use benchmark model: aae, char_rnn, latentgan, organ and vae
You can use benchmark dataset: QM9, ZINC, ZINC250K
Before running the code, make sure to unzip the train.zip and test_stats.zip files of ZINC, which were compressed due to upload limitations
If you use cuda, add '--device cuda:{device_idx}', else --device cpu
If you want to train model with selfies format, you add the '--use_selfies 1' when you run the scripts/run.py
- if you don't add it, the model is trained by smiles format automatically
- (!!Caution!!) if you use --use_selfies with any kind of format (ex: 0,1, ..., True, False...), the model is trained by selfies

Example:

python scripts/run.py --device cuda:0 —model vae --use_selfies 1 --n_batch 2048

For use the wandb, you need to setup below file:

python scripts/run.py --device cuda:0 —model vae --use_selfies 1 --n_batch 2048 --wandb_entity {wandb_id} --wandb_project {project_name} --nowandb 0

How to run VAE with property predictor
- When you run the vae with property predictor, you can choose certain properties or all.

Example:

python scripts/run.py --device cuda:0 —model vae_property --reg_prop_tasks logP qed --n_batch 2048

Adding the Dataset

If you train model using your model, add the splited dataset named train.csv, test.csv in moses > dataset > data > {datasetname} > files For example, we have already make the directory for ZINC and QM9 dataset

Sampling (Generate Sample using trained model)

n_samples: how many samples do you want to generate
model_save_time: the time of the model folder
load_epoch: what epoch do you want to use

python scripts/run_samples.py --model_save_time 20240515_021753 --model vae --data ZINC --load_epoch 080 --n_samples 1000

Evaluation

device: choose cpu or cuda:{index}
n_jobs: How many workers for evaluating the models

python scripts/run_eval.py --data ZINC --model vae --model_save_time 20240515_021753 --device cpu --n_jobs 8

Reference code

We re-generate the code from https://github.com/molecularsets/moses for our project.

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
data		data
gpr		gpr
moses		moses
notebooks		notebooks
results		results
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NewMoses

Dependency

Environment setup

Running the benchmark data and model

Adding the Dataset

Sampling (Generate Sample using trained model)

Evaluation

Reference code

About

Releases

Packages

Contributors 4

Languages

sunghwanism/NewMoses

Folders and files

Latest commit

History

Repository files navigation

NewMoses

Dependency

Environment setup

Running the benchmark data and model

Adding the Dataset

Sampling (Generate Sample using trained model)

Evaluation

Reference code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages