Skip to content

MCDTA: A 4D tensor‑enhanced multi‑dimensional convolutional neural network for accurate prediction of protein–ligand binding affinity

Notifications You must be signed in to change notification settings

dfhuang-AI/MCDTA

Repository files navigation

MCDTA

MCDTA: A 4D tensor‑enhanced multi‑dimensional convolutional neural network for accurate prediction of protein–ligand binding affinity

Requirements

python==3.7.12

rdkit==2023.3.2

numpy==1.21.6

pandas==1.3.5

biopython==1.81

scipy==1.7.3

torch==1.13.1+cu117

[torch_geometric](PyG Documentation — pytorch_geometric documentation (pytorch-geometric.readthedocs.io))==2.1.0

Example usage

  • Due to the protein files are too large, we put them into "Releases" module. You need to download and copy them to './data/' folder or './example/' folder in advance.

1. Use our pre-trained model

In this section,we provide the test set and two external validation sets data, you can directly conduct the following command to run our pre-trained model and get the results on the sets.

# Run the following command.
python test_pretrain.py -S XXX
# XXX represents test/csar/astex

2. Run on your datasets

In this section, you must provide .mol2 file of the ligand as well as .pdb file of the protein. We provide an example for data preparation and feature engineering based on the test set.

(1) Firstly, convert .pdb file into .fasta file by running the following command.

cd example/
python pdb_to_fasta.py

(2) Next, you need to keep ligands that could be properly read and converted into standard SMILES and corresponding proteins. You can get the pdbid of each complex, SMILES of each ligand, and corresponding binding affinity value by running the following command.

python canonical_smiles_generation.py

(3) Then, you need to keep complexes that could generate protein-ligand interaction representations. You can get the valid test set information and protein-ligand interaction grid by running the following command.

python gridFeaturize.py

(4) Then, you can choose to generate protein distance matrix by running the following command.

python pro_graph_feat.py

(5) Finally, when all the data is ready, you can copy all feature files into '../data/' folder and train your own model by running the following command.

cd ..
python train.py

About

MCDTA: A 4D tensor‑enhanced multi‑dimensional convolutional neural network for accurate prediction of protein–ligand binding affinity

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages