Step 1: Installation

VariPred was trained using the softwares included in the file "requirements.txt". Please install requirements first.

$ [email protected]:wlin16/VariPred.git
$ cd VariPred
$ conda create -n varipred python=3.8.5
$ conda install --file requirements.txt

=========================================================================================================

Step 2: Prepare dataset

The input for VariPred must contains

mutation info (e.g. NP_001035957.1_L847P), a.k.a "target_id"
mutation position, a.k.a "aa_index"
wild-type amino acid, a.k.a "wt_aa"
mutant-type amino acid, a.k.a "mt_aa"
wild-type sequence, a.k.a "wt_seq"
mutant-type sequence, a.k.a "mt_seq"

Label is optional for a prediction purpose but neccessary for preparing the training and test sets if re-train the model.

Please name the column of the dataframe according to the example files under the "example" folder.

For data with UniProt IDs, please fetch the wildtype with https://www.uniprot.org/id-mapping
For data with RefSeq IDs (NP ids), please use the "prepare_dataset.py" script under the Dataset_preparation folder:
- Inside Dataset_preparation folder, we prepared an example "target.txt". The first parameter is the name of the file.
```
$ cd Dataset_preparation
$ python3 prepare_dataset.py target
```
Now, we have a dataframe named as "target.csv"

=========================================================================================================

Step 3: Train the model

Note: The weight of the trained model mentioned in the publication has been given under the directory of VariPred/model/model.ckpt. Running train_VariPred.sh script to re-train the model will replace the given weight. If there is no need to customize the model for a specific task, but only to use a trained model to make clinical impact predictions for variants, please skip this step and proceed directly to step 5.

If there is no need to customize the model for a specific task or evaluate the performance of VariPred, but only to use VariPred to make clinical impact predictions for variants, please skip Step 3 and Step 4 and proceed directly to Step 5.

We recommand you have an at least 12GB GPU, e.g. NVIDIA GeForce 1080Ti

PyTorch should be installed, see: https://pytorch.org/get-started/locally/

"train.csv", "test.csv" are the example files as the training and test sets to re-train the model. "target.csv" is the example file for a simple prediction purpose.

- To prepare a training set and a test set, you can prepare the datasets with the following codes
```
$ python3 prepare_dataset.py VariPred_train
$ python3 prepare_dataset.py VariPred_test
```
Now, we have two dataframe named as "VariPred_train.csv" and "VariPred_test.csv" under the directory of example/dataset

=========================================================================================================

Setp 4: Fetch the embeddings and train the model

If you would like to re-train the VariPred, the embedding representations need to be generated for both the training and test sets.
```
$ cd ../VariPred
```
Replace the variables based on the paths where you stored your datasets in "train_VariPred.sh" script. Then run the script. This will give you the performance of the model (MCC and AUC-ROC scores).

$ ./train_VariPred.sh

=========================================================================================================

Setp 5: Fetch the embeddings and predict the effects of variants

To predict the effects of variants by VariPred, please replace the variables based on the paths where you stored your data in "predict.sh" script. Then run the script. This will give you the clinical impact of each variants
```
$ cd ../VariPred
$ ./predict.sh
```

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Dataset_preparation		Dataset_preparation
VariPred		VariPred
example/dataset		example/dataset
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Step 1: Installation

Step 2: Prepare dataset

Step 3: Train the model

Setp 4: Fetch the embeddings and train the model

Setp 5: Fetch the embeddings and predict the effects of variants

About

Releases

Packages

Languages

vxh357/VariPred

Folders and files

Latest commit

History

Repository files navigation

Step 1: Installation

Step 2: Prepare dataset

Step 3: Train the model

Setp 4: Fetch the embeddings and train the model

Setp 5: Fetch the embeddings and predict the effects of variants

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages