Flu Pipeline Notes

VDB

Upload documents to VDB

Download sequences and meta information from GISAID

In EPIFLU, select host as human, select HA as required segment, select Submission Date >= last upload date to vdb
Ideally download about 5000 isolates at a time, may have to split downloads by submission date
Download Isolates as XLS with YYYY-MM-DD date format
Download Isolates as "Sequences (DNA) as FASTA"
- Select all DNA
- Fasta Header as 0: DNA Accession no., 1: Isolate name, 2: Isolate ID, 3: Segment, 4: Passage details/history, 5: Submitting lab
- DNA Accession no. | Isolate name | Isolate ID | Segment | Passage details/history | Submitting lab

Move files to fauna/data as gisaid_epiflu.xls and gisaid_epiflu.fasta.
Upload to vdb database

python2 vdb/flu_upload.py -db vdb -v flu --source gisaid --fname gisaid_epiflu
Recommend running with --preview to confirm strain names and locations are correctly parsed before uploading
- Can add to geo_synonyms file, flu_strain_name_fix file and flu_fix_location_label file to fix some of the formatting.

Update documents in VDB

All of these functions are quite slow given they run over ~600k documents. Use sparingly.

Update genetic grouping fields
- python2 vdb/flu_update.py -db vdb -v flu --update_groupings
- updates vtype, subtype, lineage
Update locations
- python2 vdb/flu_update.py -db vdb -v flu --update_locations
- updates division, country and region from location
Update passage_category fields
- python2 vdb/flu_update.py -db vdb -v flu --update_passage_categories
- update passage_category based on passage field

Download documents from VDB

python2 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_h3n2 --fstem h3n2
python2 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_h1n1pdm --fstem h1n1pdm
python2 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_vic --fstem vic
python2 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_yam --fstem yam

TDB

Upload documents to TDB

Raw tables from NIMR reports

Convert NIMR report pdfs to csv files
Move csv files to subtype directory in fauna/data/
Upload to tdb database

python2 tdb/upload.py -db tdb -v flu --subtype h3n2 --ftype flat --fstem h3n2_nimr_titers
Recommend running with --preview to confirm strain names are correctly parsed before uploading
- Can add to HI_ref_name_abbreviations file and HI_flu_strain_name_fix file to fix some strain names.

Flat files

Move line-list tsv files to fauna/data/
Upload to tdb database with python2 tdb/upload.py -db tdb -v flu --subtype h3n2 --ftype flat --fstem H3N2_HI_titers_upload

CDC files

Move line-list tsv files to fauna/data/
Upload HI titers to tdb database with python2 tdb/cdc_upload.py -db cdc_tdb -v flu --ftype flat --fstem HITest_Oct2019_to_Sep2020_titers
Upload FRA titers to tdb database with python2 tdb/cdc_upload.py -db cdc_tdb -v flu --ftype flat --fstem FRA_Oct2019_to_Sep2020_titers

Crick files

Move Excel documents to fauna/data/
Run python2 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem H3N2HIs
Run python2 tdb/crick_upload.py -db crick_tdb --assay_type fra --fstem H3N2VNs
Run python2 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem H1N1pdm09HIs
Run python2 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem BVicHIs
Run python2 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem BYamHIs

NIID files

Make sure NIID-Tokyo-WHO-CC/ is a sister directory to fauna/
Upload all titers with python2 tdb/upload_all.py --sources niid -db niid_tdb

VIDRL files

Make sure VIDRL-Melbourne-WHO-CC/ is a sister directory to fauna/
Upload all titers with python2 tdb/upload_all.py --sources vidrl -db vidrl_tdb

Download documents from TDB

python2 tdb/download.py -db tdb -v flu --subtype h3n2
python2 tdb/download.py -db tdb -v flu --subtype h1n1pdm
python2 tdb/download.py -db tdb -v flu --subtype vic
python2 tdb/download.py -db tdb -v flu --subtype yam

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FLU.md

FLU.md

Flu Pipeline Notes

VDB

Upload documents to VDB

Update documents in VDB

Download documents from VDB

TDB

Upload documents to TDB

Raw tables from NIMR reports

Flat files

CDC files

Crick files

NIID files

VIDRL files

Download documents from TDB

Files

FLU.md

Latest commit

History

FLU.md

File metadata and controls

Flu Pipeline Notes

VDB

Upload documents to VDB

Update documents in VDB

Download documents from VDB

TDB

Upload documents to TDB

Raw tables from NIMR reports

Flat files

CDC files

Crick files

NIID files

VIDRL files

Download documents from TDB