ERGO is a deep learning based model for predicting TCR-peptide binding.
Check our web-tool at http://tcr.cs.biu.ac.il
pytorch 1.4.0
numpy 1.18.1
scikit-learn 0.22.1
The main module for training is ERGO.py
.
For training, run:
python ERGO.py train model_type database specific gpu --model_file=model.pt --train_data_file=train_data --test_data_file=test_data
where:
model_type
is the the type of TCR encoding, LSTM based withlstm
or autoencoder based withae
- database is the training database, McPAS-TCR with
mcpas
or VDJdb withvdjdb
. gpu
is cuda device to use (e.g.cuda:0
), orcpu
for CPU (but it might be way slower)--model_file
is the file which the model is saved to after training.--train_data_file
and--test_data_file
are train and test data files, you can set them asauto
for defaults.
If you are interested in prediction only and not interested in training ERGO models, It might be more convenient to use our web tool, available here. You can choose what model and training set to use, and get the binding score of given TCRs and peptides from a csv file.
Anyway you can also predict using the ERGO.py
module.
It is quite similar to training, run:
python ERGO.py predict model_type database specific gpu --model_file=model.pt --train_data_file=train_data --test_data_file=test_data
where:
--model_file
is the trained model file.--test_data_file
is a csv file with TCR and peptide columns. See example file in the ERGO website.- All other cmd parameters are similar to the training process.
The trained models and some of the train/test datasets we used are stored in the models directory.
The autoencoder based model requires a pre-trained TCR-autoencoder. for training the TCR-autoencoder, go to the TCR-Autoencoder directory using
cd TCR_Autoencoder
and run:
python train_tcr_autoencoder.py BM_data_CDR3s device model_file.pt
when device
is a CUDA GPU device (e.g. 'cuda:0') or 'cpu' for CPU device.
The trained autoencoder will be saved in model_file
as a pytorch model.
You can use the already trained tcr_autoencoder.pt
model instead.