###Getting started
git clone https://github.com/lh3/bwa.git
cd bwa; make
./bwa index ref.fa
./bwa mem ref.fa read-se.fq.gz | gzip -3 > aln-se.sam.gz
./bwa mem ref.fa read1.fq read2.fq | gzip -3 > aln-pe.sam.gz
###Introduction
BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as the support of long reads and chimeric alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.
For all the algorithms, BWA first needs to construct the FM-index for the reference genome (the index command). Alignment algorithms are invoked with different sub-commands: aln/samse/sampe for BWA-backtrack, bwasw for BWA-SW and mem for the BWA-MEM algorithm.
###Availability
BWA is released under GPLv3. The latest souce code is freely
available at github. Released packages can be downloaded at
SourceForge. After you acquire the source code, simply use make
to compile
and copy the single executable bwa
to the destination you want. The only
dependency of BWA is zlib.
###Seeking helps
The detailed usage is described in the man page available together with the
source code. You can use man ./bwa.1
to view the man page in a terminal. The
HTML version of the man page can be found at the BWA website. If you
have questions about BWA, you may sign up the mailing list and then send
the questions to [email protected]. You may also ask questions
in forums such as BioStar and SEQanswers.
###Citing BWA
-
Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]. (if you use the BWA-backtrack algorithm)
-
Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589-595. [PMID: 20080505]. (if you use the BWA-SW algorithm)
-
Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2 [q-bio.GN]. (if you use the BWA-MEM algorithm or the fastmap command, or want to cite the whole BWA package)
Please note that the last reference is a preprint hosted at arXiv.org. I do not have plan to submit it to a peer-reviewed journal in the near future.
###Frequently asked questions (FAQs)
####How to map sequences to GRCh38 with ALT contigs?
BWA-backtrack and BWA-MEM partially support mapping to a reference containing ALT contigs that represent alternative alleles highly divergent from the reference genome.
# download the K8 executable required by bwa-helper.js
wget http://sourceforge.net/projects/lh3/files/k8/k8-0.2.1.tar.bz2/download
tar -jxf k8-0.2.1.tar.bz2
# download the ALT-to-GRCh38 alignment in the SAM format
wget http://sourceforge.net/projects/bio-bwa/files/hs38.alt.sam.gz/download
# download the GRCh38 sequences with ALT contigs
wget ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_analysis_set.fna.gz
# index and mapping
bwa index -p hs38a GCA_000001405.15_GRCh38_full_analysis_set.fna.gz
bwa mem -h50 hs38a reads.fq | ./k8-linux bwa-helper.js genalt hs38.alt.sam.gz > out.sam
Here, option -h50
asks bwa-mem to output multiple hits in the XA tag if the
read has 50 or fewer hits. For each SAM line containing the XA tag,
bwa-helper.js genalt
decodes the alignments in the XA tag, groups hits lifted
to the same chromosomal region, adjusts mapping quality and outputs all the
hits overlapping the reported hit. A read may be mapped to both the primary
assembly and one or more ALT contigs all with high mapping quality.
Note that this procedure assumes reads are single-end and may miss hits to
highly repetitive regions as these hits will not be reported with option
-h50
. bwa-helper.js
is a prototype implementation not recommended for
production uses.