You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -84,13 +84,13 @@ tbl2asn can be found at https://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/
84
84
85
85
**After the above has completed there's still a bit of setup that needs to be done.**
86
86
87
-
1. Download a reference database over on the release tab, (https://github.com/rcs333/VAPiD/releases/). Download the .nhr .nsq and .nin file and place them in the VAPiD folder. You can download all of them to see which one works best for your case. Pick a different reference using the `--db` flag!
87
+
1. Download a reference database over on the release tab, (https://github.com/rcs333/VAPiD/releases/). Download the .nhr .nsq and .nin file and place them in the VAPiD folder. You can download all of them to see which one works best for your case. Pick a different reference using the `--db` flag! (If you don't specify a database VAPiD will try all_virus, then compressed, then refseq automatically)
88
88
89
89
2. (Mac and Linux only )Install MAFFT using your favorite package manager ('brew install mafft' or 'sudo apt-get install mafft') or download and install from https://mafft.cbrc.jp/alignment/software/ for your appropriate system. (THIS IS ALREADY DONE FOR WINDOWS and is included under the GPL licence)
90
90
91
-
3. A example.sbt file that contains your name and information about the organization that you wish to submit your sequences to NCBI under. This .sbt file can be generated by filling out the form here: https://submit.ncbi.nlm.nih.gov/genbank/template/submission/
91
+
3. A .sbt file that contains your name and information about the organization that you wish to submit your sequences to NCBI under. This .sbt file can be generated by filling out the form here: https://submit.ncbi.nlm.nih.gov/genbank/template/submission/ (The example.sbt file is included so that you can verify your installation, please don't use this for actual submissions)
92
92
93
-
4. Put the newly generated .sbt file onto your computer. (Generally it's easiest to just put it in the VAPiD folder). If you'll be submitting multiple sequences from different people you can generate more than one, put them all in this folder, and choose which one you want to use at run time.
93
+
4. Put the newly generated .sbt file onto your computer. (Generally it's easiest to just put it in the VAPiD folder). If you'll be submitting multiple sequences from different people you can generate more than one, put them all in this folder, and choose which one you want to use at run time. (Although you can only select one .sbt per fasta input)
94
94
95
95
5. (Optional)
96
96
You can generate a .csv file with most metadata that you wish to associate with your sequences. The file should have across the top Strain (the name of the fasta sequences that you'll import) followed by columns with NCBI approved metadata https://www.ncbi.nlm.nih.gov/Sequin/modifiers.html
@@ -199,7 +199,7 @@ I do NOT reccomend batching submissions that mix these options. Also, if my solu
199
199
200
200
# Implementation Details and Important Notes
201
201
202
-
A large problem is actually inconsistent spelling in GenBank sequence records or sequence records that do not have every protein annotated. The ESpell utility from NCBI is currently being used to check spelling on protein names. However this can result in certain protein names losing capitilization (i.e. IIIa3 will get changed to iiia3). Also novel sequences with muations directly in the stop codon alignment with the reference can cause a few extra stop codons to get added. First verify that your sequence is correct and if it is please email me or open an issue on the GitHub.
202
+
A large problem is actually inconsistent spelling in GenBank sequence records or sequence records that do not have every protein annotated. The ESpell utility from NCBI is currently being used to check spelling on protein names. However this can result in certain protein names losing capitilization (i.e. IIIa3 will get changed to iiia3).
203
203
204
204
# Future directions
205
205
Preprint is avalible at (https://www.biorxiv.org/content/early/2018/09/18/420463) and paper is currently under review.
0 commit comments