A python command line based script for generating proteome databases for variants proteins from a list of variants.
>python Variant_DB_generator.py proteome_database.fasta protein_variants_list.txt
usage: Variant_DB_generator.py [-h] -FA [-FA ...] -F [-F ...]
Custom generate proteome databases for protein variants such as SNPs from a list of SNPs corresponding to a protein.
positional arguments:
-FA Proteome database of interest
-F A .txt (text) file with a list of protein variants
optional arguments:
-h, --help show this help message and exit
>python extract_annotated_variants.py test.vcf -f feature_table.txt
usage: extract_annotated_variants.py [-h]
[-f FEATURE_TABLE [FEATURE_TABLE ...]]
-i [-i ...]
Extract protein coding variants from snpEff annotated .vcf file and save it in .txt format
positional arguments:
-i snpEff annotated .vcf files
optional arguments:
-h, --help show this help message and exit
-f FEATURE_TABLE [FEATURE_TABLE ...], --feature_table FEATURE_TABLE [FEATURE_TABLE ...]
If snpEff provides only locus tag info, we need to
extract protein accession details from feature table
downloaded from RefSeq ftp path
>python Variant_peps_uniqueness.py -h
usage: Variant_peps_uniqueness.py [-h]
-ip [-ip ...] -rf [-rf ...] -vf [-vf ...]
Uniqueness of variant peptides identified from the DDA database search can be
checked by matching it with reference and variant proteome databases
positional arguments:
-ip Exported PSMs of variant database search from Proteome
Discoverer
-rf Reference proteome database of the same species in fasta format
-vf Variant proteome database in fasta format used for the search
optional arguments:
-h, --help show this help message and exit