ResGen: A Pocket-aware 3D Molecular Generation Model Based on Parallel Multi-scale Modeling
ResGen is the newly developed method for 3D pocket-aware molecular generation.mamba env create -f resgen.yml
mamba activate resgen
(we recommend using mamba instead of conda, if you're old school, just change the mamba with conda)
mamba create -n resgen python=3.8
mamba install pytorch==1.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
mamba install pyg -c pyg
mamba install -c conda-forge rdkit
mamba install biopython -c conda-forge
mamba install pyyaml easydict python-lmdb -c conda-forge
The main data for training is CrossDock2020, which is utilized by most of the methods.
Note: data is only necessary for training. For use-only mode, please directly check the generation part.
wget https://bits.csb.pitt.edu/files/crossdock2020/CrossDocked2020_v1.1.tgz -P data/crossdock2020/
tar -C data/crossdock2020/ -xzf data/crossdock2020/CrossDocked2020_v1.1.tgz
wget https://bits.csb.pitt.edu/files/it2_tt_0_lowrmsd_mols_train0_fixed.types -P data/crossdock2020/
wget https://bits.csb.pitt.edu/files/it2_tt_0_lowrmsd_mols_test0_fixed.types -P data/crossdock2020/
The storage size of original crossdock2020 is 50 GB, hard to download and unzip. You can skip to the Approach 1 or Approach 2 for training preparation.
You can download the processed data from this link. This is the processed version of original files, which is processed by Luoshi Tong.
Note: index.pkl, split_by_name.pt. are automatically downloaded with the SurfGen code. index.pkl saves the information of each protein-ligand pair, while split_by_name.pt save the train-test split of the dataset.
tar -xzvf crossdocked_pocket10.tar.gz
python process_data.py --raw_data ./data/crossdocked_pocket10
or you can download the processed data lmdb, key, and name2id.
The trained model's parameters could be downloaded here.
python gen.py --pdb_file ./examples/4iiy.pdb --sdf_file ./examples/4iiy_ligand.sdf --outdir ./examples
You can also follow the guide at generation/generation.ipynb
We provide the pdbid-14gs as the example
The training process is released as train.py, the following command is an example of how to train a model.
python train.py --config ./configs/train_res.yml --logdir logs
This project draws in part from GraphBP and Pocket2Mol, supported by GPL-v3 License and MIT License. Thanks for their great work and code, hope readers of interest could check their work, too.
If you find this work interesting, please cite
@article{zhang2023resgen,
title={ResGen is a pocket-aware 3D molecular generation model based on parallel multiscale modelling},
author={Zhang, Odin and Zhang, Jintu and Jin, Jieyu and Zhang, Xujun and Hu, RenLing and Shen, Chao and Cao, Hanqun and Du, Hongyan and Kang, Yu and Deng, Yafeng and others},
journal={Nature Machine Intelligence},
volume={5},
number={9},
pages={1020--1030},
year={2023},
publisher={Nature Publishing Group UK London}
}