torch <= 1.13
create and activate conda environment named moses
with python=3.8
conda create -n moses python=3.8 -y \
conda activate moses \
pip install -r requirements.txt \
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
if you use MacBook (or DO NOT USE cuda), use this code
conda create -n moses python=3.8 -y \
conda activate moses \
pip install -r requirements.txt \
pip install torch==1.12.0+cpu torchvision==0.13.0+cpu torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu
- Only train one model on each process b/c of wandb tracking
- run the below code for training the model
- You can use benchmark model: aae, char_rnn, latentgan, organ and vae
- You can use benchmark dataset: QM9, ZINC, ZINC250K
- Before running the code, make sure to unzip the
train.zip
andtest_stats.zip
files of ZINC, which were compressed due to upload limitations - If you use cuda, add '--device cuda:{device_idx}', else --device cpu
- If you want to train model with selfies format, you add the '--use_selfies 1' when you run the scripts/run.py
- if you don't add it, the model is trained by smiles format automatically
- (!!Caution!!) if you use --use_selfies with any kind of format (ex: 0,1, ..., True, False...), the model is trained by selfies
Example:
python scripts/run.py --device cuda:0 —model vae --use_selfies 1 --n_batch 2048
For use the wandb, you need to setup below file:
python scripts/run.py --device cuda:0 —model vae --use_selfies 1 --n_batch 2048 --wandb_entity {wandb_id} --wandb_project {project_name} --nowandb 0
- How to run VAE with property predictor
- When you run the vae with property predictor, you can choose certain properties or all.
Example:
python scripts/run.py --device cuda:0 —model vae_property --reg_prop_tasks logP qed --n_batch 2048
If you train model using your model, add the splited dataset named train.csv, test.csv in moses > dataset > data > {datasetname} > files For example, we have already make the directory for ZINC and QM9 dataset
- n_samples: how many samples do you want to generate
- model_save_time: the time of the model folder
- load_epoch: what epoch do you want to use
python scripts/run_samples.py --model_save_time 20240515_021753 --model vae --data ZINC --load_epoch 080 --n_samples 1000
- device: choose cpu or cuda:{index}
- n_jobs: How many workers for evaluating the models
python scripts/run_eval.py --data ZINC --model vae --model_save_time 20240515_021753 --device cpu --n_jobs 8
We re-generate the code from https://github.com/molecularsets/moses for our project.