Skip to content

A script about extracting snp on candidate genes based python3.

Notifications You must be signed in to change notification settings

biologyzhangbo/SnpExt_Genebased

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

The first script is Gene_locat.py.
The script requires two files, one is the processed GFF file, and the other is the gene id file.
The processed GFF file can be obtained through the following ways:
First, download the annotation file of the target genome from phytozome, and then run the following command in linux:
grep “gene” refgenome.gff> gene.refgenome.gff
A usable gff file should be same as the following format:
Chr01 phytozomev10 gene 1951 2616 . + . ID=Sobic.001G000100.v3.1;Name=Sobic.001G000100;ancestorIdentifier=Sobic.001G000100.v2.1 Chr01 phytozomev10 gene 11180 14899 . - . ID=Sobic.001G000200.v3.1;Name=Sobic.001G000200;ancestorIdentifier=Sobic.001G000200.v2.1
Chr01 phytozomev10 gene 23399 24152 . - . ID=Sobic.001G000300.v3.1;Name=Sobic.001G000300;ancestorIdentifier=Sobic.001G000300.v2.1
Chr01 phytozomev10 gene 22391 42443 . - . ID=Sobic.001G000400.v3.1;Name=Sobic.001G000400;ancestorIdentifier=Sobic.001G000400.v2.1

The gene id file should have one gene name per line as follows:
Sobic.001G355700
Sobic.002G484000
Sobic.005G821200
...

Then, run the command as follows:
python Gene_locat.py geneidfile gene.refgenome.gff genelocationfile

The second script is SnpExt_Genebased.py.

The script requires two files, one is the processed VCF file, and the other is the gene location file.
The processed VCF file can be obtained through the following ways:
grep "#" VCFfile > title
grep -v "#" VCFfile > processed.VCFfile
The gene location file can be produced by Gene_locat.py.
Second, run the command as following:
python SnpExt_Genebased.py genelocationfile processed.VCFfile
cat title extract.vcf > extracted.vcf
rm extract.vcf

Contact: [email protected]

About

A script about extracting snp on candidate genes based python3.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages