From cfa1a48d650736d9d077e944025a9ccceb159889 Mon Sep 17 00:00:00 2001 From: Wengong Jin Date: Thu, 3 Feb 2022 19:40:57 -0500 Subject: [PATCH] Update README.md --- README.md | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 1d34852..37908a6 100644 --- a/README.md +++ b/README.md @@ -15,9 +15,11 @@ Our model is tested in Linux with the following packages: Our data is retreived from the Structural Antibody Database (SAbDab). The training, validation, and test data (compressed) is located in `data/sabdab`. To train a RefineGNN for CDR-H3, please run ``` -python ab_train.py --cdr_type 3 --train_path data/sabdab/hcdr3_cluster/train.jsonl --val_path data/sabdab/hcdr3_cluster/val.jsonl --test_path data/sabdab/hcdr3_cluster/test.jsonl +python ab_train.py --cdr_type 3 --train_path data/sabdab/hcdr3_cluster/train_data.jsonl --val_path data/sabdab/hcdr3_cluster/val_data.jsonl --test_path data/sabdab/hcdr3_cluster/test_data.jsonl ``` -The default hyperparameters are: hidden layer dimension `--hidden_size 256`, number of message passing layers `--depth 4`, KNN neighborhood size `--K_neighbors 9`, and the framework residue block size `--block_size 8` (multi-resolution modeling, section 3.3). The training process requires 20~24GB GPU memory. During training, this script will report perplexity (PPL) and root-mean-square-error (RMSD) over the validation set. You can also train a RefineGNN for a different CDR region by changing `--cdr_type 2` (CDR-H2) and `--cdr_type 1` (CDR-H1). +The default hyperparameters are: hidden layer dimension `--hidden_size 256`, number of message passing layers `--depth 4`, KNN neighborhood size `--K_neighbors 9`, and the framework residue block size `--block_size 8` (multi-resolution modeling, section 3.3). + +During training, this script will report perplexity (PPL) and root-mean-square-error (RMSD) over the validation set. You can also train a RefineGNN for a different CDR region by changing `--cdr_type 2` (CDR-H2) and `--cdr_type 1` (CDR-H1). If you don't want to train RefineGNN from scratch, please load a pre-trained model and run inference on the test set by ``` @@ -25,9 +27,9 @@ python ab_train.py --cdr_type 3 --load_model ckpts/RefineGNN-hcdr3/model.best -- ``` where `--epoch 0` means zero training epochs. -Note: GPU memory consumption can be substantially reduced by removing the multi-resolution modeling component. If you have limited GPU memory, you can train a RefineGNN without multi-resolution modeling by +Note: The above training script usually requires 20~24GB GPU memory. The GPU memory consumption can be substantially reduced by removing the multi-resolution modeling component. If you have limited GPU memory, you can train a RefineGNN without multi-resolution modeling by ``` -python baseline_train.py --cdr_type 3 --train_path data/sabdab/hcdr3_cluster/train.jsonl --val_path data/sabdab/hcdr3_cluster/val.jsonl --test_path data/sabdab/hcdr3_cluster/test.jsonl --architecture RefineGNN_attonly +python baseline_train.py --cdr_type 3 --train_path data/sabdab/hcdr3_cluster/train_data.jsonl --val_path data/sabdab/hcdr3_cluster/val_data.jsonl --test_path data/sabdab/hcdr3_cluster/test_data.jsonl --architecture RefineGNN_attonly ``` The above training script usually consumes 4GB GPU memory. You can also train our AR-GNN baseline by setting `--architecture AR-GNN`. @@ -40,3 +42,11 @@ At test time, we generate 10000 CDR-H3 sequences for each antibody and select th ``` python rabd_test.py --load_model ckpts/RefineGNN-rabd/model.best ``` + +## CDR Structure Visualization +You can inspect predicted CDR structure by running the following script +``` +python print_cdr.py --data_path data/sabdab/hcdr3_cluster/test_data.jsonl --load_model ckpts/RefineGNN-hcdr3/model.best --rmsd_threshold 0.8 --save_dir pred_pdbs/ +``` +This script will print predicted CDR structures in the `pred_pdbs/` folder. You can visualize the generated CDR loops (i.e. 4bkl.pdb) in PyMOL. +Overall, there are still many failure cases and the structure prediction part needs to be improved.