VGP repository for the genome assembly working group
- DNAnexus Workflow
- Docker images with WDL Workflow
- Pipeline for local run
- MitoVGP pipeline
- Instructions for AWS s3 genomeark
- Meta data
- Citation
The production of the VGP assemblies is performed on DNAnexus, which is available for anyone that registers. We welcome new trainees who are interested in leading the assembly of VGP and other genomes. Feel free to contact us.
- Tutorials: starting point for new trainees
- Workflows: workflow to run each assembly steps
- Applets: individual applets in the workflow
- Retrieve job info
The scaffolding pipeline to run on generic architecture and Docker containers is available to the public. This includes a WDL implementation of the scaffolding portion of VGP Assembly, as well as some of the QC steps. Note that Falcon assembly and Arrow polishing are not included.
- WDL Pipeline: read the manual first
The local pipeline is available for each individual step for scaffolding, polishing, and evaluation as bash scripts. These scripts were used to locally assemble the first 17 genomes described in Rhie et al. 2021.
Note the scripts are optimized to run on a Slurm schedular and tested on Biowulf. All submitter scripts have a prefix of _submit_
.
Pipeline for generating mitochondrial sequences is available on a conda release.
This is only relevant for our collaborators and data managers, for sharing sequencing data on genomeark not produced by VGP.
The meta data proposal and specifications. Actual meta data is stored in this repository.
-
VGP assemblies and genome assembly pipeline: Rhie et al., Towards complete and error-free genome assemblies of all vertebrate species, bioRxiv 2020. doi: https://doi.org/10.1101/2020.05.22.110833
-
Mitochondrial genome assembly pipeline: Formenti et al., Complete vertebrate mitogenomes reveal widespread gene duplications and repeats, bioRxiv 2020. doi: https://doi.org/10.1101/2020.06.30.177956