MCDTA

MCDTA: A 4D tensor‑enhanced multi‑dimensional convolutional neural network for accurate prediction of protein–ligand binding affinity

Requirements

python==3.7.12

rdkit==2023.3.2

numpy==1.21.6

pandas==1.3.5

biopython==1.81

scipy==1.7.3

torch==1.13.1+cu117

[torch_geometric](PyG Documentation — pytorch_geometric documentation (pytorch-geometric.readthedocs.io))==2.1.0

Example usage

Due to the protein files are too large, we put them into "Releases" module. You need to download and copy them to './data/' folder or './example/' folder in advance.

1. Use our pre-trained model

In this section，we provide the test set and two external validation sets data, you can directly conduct the following command to run our pre-trained model and get the results on the sets.

# Run the following command.
python test_pretrain.py -S XXX
# XXX represents test/csar/astex

2. Run on your datasets

In this section, you must provide .mol2 file of the ligand as well as .pdb file of the protein. We provide an example for data preparation and feature engineering based on the test set.

(1) Firstly, convert .pdb file into .fasta file by running the following command.

cd example/
python pdb_to_fasta.py

(2) Next, you need to keep ligands that could be properly read and converted into standard SMILES and corresponding proteins. You can get the pdbid of each complex, SMILES of each ligand, and corresponding binding affinity value by running the following command.

python canonical_smiles_generation.py

(3) Then, you need to keep complexes that could generate protein-ligand interaction representations. You can get the valid test set information and protein-ligand interaction grid by running the following command.

python gridFeaturize.py

(4) Then, you can choose to generate protein distance matrix by running the following command.

python pro_graph_feat.py

(5) Finally, when all the data is ready, you can copy all feature files into '../data/' folder and train your own model by running the following command.

cd ..
python train.py

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
example		example
images		images
models		models
results		results
MCDTA.py		MCDTA.py
MCDTA_utils.py		MCDTA_utils.py
README.md		README.md
SMILES.yml		SMILES.yml
evaluate_metrics.py		evaluate_metrics.py
graphFeaturize.py		graphFeaturize.py
test_pretrain.py		test_pretrain.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MCDTA

Requirements

Example usage

1. Use our pre-trained model

2. Run on your datasets

(1) Firstly, convert .pdb file into .fasta file by running the following command.

(2) Next, you need to keep ligands that could be properly read and converted into standard SMILES and corresponding proteins. You can get the pdbid of each complex, SMILES of each ligand, and corresponding binding affinity value by running the following command.

(3) Then, you need to keep complexes that could generate protein-ligand interaction representations. You can get the valid test set information and protein-ligand interaction grid by running the following command.

(4) Then, you can choose to generate protein distance matrix by running the following command.

(5) Finally, when all the data is ready, you can copy all feature files into '../data/' folder and train your own model by running the following command.

About

Releases 1

Packages

Languages

dfhuang-AI/MCDTA

Folders and files

Latest commit

History

Repository files navigation

MCDTA

Requirements

Example usage

1. Use our pre-trained model

2. Run on your datasets

(1) Firstly, convert .pdb file into .fasta file by running the following command.

(2) Next, you need to keep ligands that could be properly read and converted into standard SMILES and corresponding proteins. You can get the pdbid of each complex, SMILES of each ligand, and corresponding binding affinity value by running the following command.

(3) Then, you need to keep complexes that could generate protein-ligand interaction representations. You can get the valid test set information and protein-ligand interaction grid by running the following command.

(4) Then, you can choose to generate protein distance matrix by running the following command.

(5) Finally, when all the data is ready, you can copy all feature files into '../data/' folder and train your own model by running the following command.

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages