Skip to content

Commit

Permalink
updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
lh3 committed May 25, 2013
1 parent 599e840 commit 607e11d
Show file tree
Hide file tree
Showing 4 changed files with 94 additions and 47 deletions.
36 changes: 0 additions & 36 deletions README

This file was deleted.

73 changes: 73 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
###Getting started

git clone https://github.com/lh3/bwa.git
cd bwa; make
./bwa index ref.fa
./bwa mem ref.fa read-se.fq.gz | gzip -3 > aln-se.sam.gz
./bwa mem ref.fa read1.fq read2.fq | gzip -3 > aln-pe.sam.gz

###Introduction

BWA is a software package for mapping low-divergent sequences against a large
reference genome, such as the human genome. It consists of three algorithms:
BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina
sequence reads up to 100bp, while the rest two for longer sequences ranged from
70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as the support of
long reads and chimeric alignment, but BWA-MEM, which is the latest, is
generally recommended for high-quality queries as it is faster and more
accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp
Illumina reads.

For all the algorithms, BWA first needs to construct the FM-index for the
reference genome (the **index** command). Alignment algorithms are invoked with
different sub-commands: **aln**/**samse**/**sampe** for BWA-backtrack,
**bwasw** for BWA-SW and **mem** for the BWA-MEM algorithm.

###Availability

BWA is released under [GPLv3][1]. The latest souce code is [freely
available][2] at github. Released packages can [be downloaded ][3] at
SourceForge. After you acquire the source code, simply use `make` to compile
and copy the single executable `bwa` to the destination you want.

###Seeking helps

The detailed usage is described in the man page available together with the
source code. You can use `man ./bwa.1` to view the man page in a terminal. The
[HTML version][4] of the man page can be found at the [BWA website][5]. If you
have questions about BWA, you may [sign up the mailing list][6] and then send
the questions to [[email protected]][7]. You may also ask questions
in forums such as [BioStar][8] and [SEQanswers][9].

###Citing BWA

* Li H. and Durbin R. (2009) Fast and accurate short read alignment with
Burrows-Wheeler transform. *Bioinformatics*, **25**, 1754-1760. [PMID:
[19451168][10]]. (if you use the BWA-backtrack algorithm)

* Li H. and Durbin R. (2010) Fast and accurate long-read alignment with
Burrows-Wheeler transform. *Bioinformatics*, **26**, 589-595. [PMID:
[20080505][11]]. (if you use the BWA-SW algorithm)

* Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs
with BWA-MEM. [arXiv:1303.3997v1][12] [q-bio.GN]. (if you use the BWA-MEM
algorithm or the **fastmap** command)

Please note that the last reference is a preprint hosted at [arXiv.org][13]. I
do not have plan to submit it to a peer-reviewed journal in the near future.



[1]: http://en.wikipedia.org/wiki/GNU_General_Public_License
[2]: https://github.com/lh3/bwa
[3]: http://sourceforge.net/projects/bio-bwa/files/
[4]: http://bio-bwa.sourceforge.net/bwa.shtml
[5]: http://bio-bwa.sourceforge.net/
[6]: https://lists.sourceforge.net/lists/listinfo/bio-bwa-help
[7]: mailto:[email protected]
[8]: http://biostars.org
[9]: http://seqanswers.com/
[10]: http://www.ncbi.nlm.nih.gov/pubmed/19451168
[11]: http://www.ncbi.nlm.nih.gov/pubmed/20080505
[12]: http://arxiv.org/abs/1303.3997
[13]: http://arxiv.org/
11 changes: 6 additions & 5 deletions bwa.1
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.TH bwa 1 "23 April 2013" "bwa-0.7.4" "Bioinformatics tools"
.TH bwa 1 "24 May 2013" "bwa-0.7.5" "Bioinformatics tools"
.SH NAME
.PP
bwa - Burrows-Wheeler Alignment Tool
Expand Down Expand Up @@ -718,12 +718,13 @@ If you use the BWA-SW algorithm, please cite:
Li H. and Durbin R. (2010) Fast and accurate long-read alignment with
Burrows-Wheeler transform. Bioinformatics, 26, 589-595. [PMID: 20080505]
.PP
If you use the fastmap component of BWA, please cite:
If you use BWA-MEM or the fastmap component of BWA, please cite:
.PP
Li H. (2012) Exploring single-sample SNP and INDEL calling with whole-genome de
novo assembly. Bioinformatics, 28, 1838-1844. [PMID: 22569178]
Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with
BWA-MEM. arXiv:1303.3997v1 [q-bio.GN].
.PP
The BWA-MEM algorithm has not been published yet.
It is likely that the BWA-MEM manuscript will not appear in a peer-reviewed
journal.

.SH HISTORY
BWA is largely influenced by BWT-SW. It uses source codes from BWT-SW
Expand Down
21 changes: 15 additions & 6 deletions bwase.c
Original file line number Diff line number Diff line change
Expand Up @@ -167,20 +167,29 @@ void bwa_cal_pac_pos(const bntseq_t *bns, const char *prefix, int n_seqs, bwa_se

#define SW_BW 50

bwa_cigar_t *bwa_refine_gapped_core(bwtint_t l_pac, const ubyte_t *pacseq, int len, ubyte_t *seq, int ref_shift, bwtint_t rb, int *n_cigar)
bwa_cigar_t *bwa_refine_gapped_core(bwtint_t l_pac, const ubyte_t *pacseq, int len, ubyte_t *seq, int ref_shift, bwtint_t *_rb, int *n_cigar)
{
bwa_cigar_t *cigar = 0;
uint32_t *cigar32 = 0;
ubyte_t *rseq;
int64_t k, re, rlen;
int64_t k, rb, re, rlen;
int8_t mat[25];

bwa_fill_scmat(1, 3, mat);
re = rb + len + ref_shift;
rb = *_rb; re = rb + len + ref_shift;
assert(re <= l_pac);
rseq = bns_get_seq(l_pac, pacseq, rb, re, &rlen);
assert(re - rb == rlen);
ksw_global(len, seq, rlen, rseq, 5, mat, 5, 1, SW_BW, n_cigar, &cigar32); // right extension
ksw_global(len, seq, rlen, rseq, 5, mat, 5, 1, SW_BW, n_cigar, &cigar32);
assert(*n_cigar > 0);
if ((cigar32[*n_cigar - 1]&0xf) == 1) cigar32[*n_cigar - 1] = (cigar32[*n_cigar - 1]>>4<<4) | 4; // change endding ins to soft clipping
if ((cigar32[0]&0xf) == 1) cigar32[0] = (cigar32[0]>>4<<4) | 4; // change beginning ins to soft clipping
if ((cigar32[*n_cigar - 1]&0xf) == 2) --*n_cigar; // delete endding del
if ((cigar32[0]&0xf) == 2) { // delete beginning del
*_rb += cigar32[0]>>4;
--*n_cigar;
memmove(cigar32, cigar32+1, (*n_cigar) * 4);
}
cigar = (bwa_cigar_t*)cigar32;
for (k = 0; k < *n_cigar; ++k)
cigar[k] = __cigar_create((cigar32[k]&0xf), (cigar32[k]>>4));
Expand Down Expand Up @@ -292,14 +301,14 @@ void bwa_refine_gapped(const bntseq_t *bns, int n_seqs, bwa_seq_t *seqs, ubyte_t
bwt_multi1_t *q = s->multi + j;
int n_cigar;
if (q->gap) { // gapped alignment
q->cigar = bwa_refine_gapped_core(bns->l_pac, pacseq, s->len, q->strand? s->rseq : s->seq, q->ref_shift, q->pos, &n_cigar);
q->cigar = bwa_refine_gapped_core(bns->l_pac, pacseq, s->len, q->strand? s->rseq : s->seq, q->ref_shift, &q->pos, &n_cigar);
q->n_cigar = n_cigar;
if (q->cigar) s->multi[k++] = *q;
} else s->multi[k++] = *q;
}
s->n_multi = k; // this squeezes out gapped alignments which failed the CIGAR generation
if (s->type == BWA_TYPE_NO_MATCH || s->type == BWA_TYPE_MATESW || s->n_gapo == 0) continue;
s->cigar = bwa_refine_gapped_core(bns->l_pac, pacseq, s->len, s->strand? s->rseq : s->seq, s->ref_shift, s->pos, &s->n_cigar);
s->cigar = bwa_refine_gapped_core(bns->l_pac, pacseq, s->len, s->strand? s->rseq : s->seq, s->ref_shift, &s->pos, &s->n_cigar);
if (s->cigar == 0) s->type = BWA_TYPE_NO_MATCH;
}
// generate MD tag
Expand Down

0 comments on commit 607e11d

Please sign in to comment.