Multimapping and reference sequence junctions #379

s-a-nersisyan · 2022-11-01T18:37:22Z

Hello,

I've experienced an issue which could be IMHO treated as a bug. As documented, bwa concatenates all sequences from the reference fasta file, and some reads could map to the junction of two reference sequences – in this case the read is marked as unmapped (SAM flag = 4), though it still reports the position.

The problem arises with multiple mappers. As documented, bwa randomly selects one of the multiple alignments as the “primary” one, leaving the others in XA tag. The trick is that sometimes the “primary” random chosen position is exactly the undesired junction, and at the same time some good alignments are marked as secondary and stored in XA. In this case we think that the read is unmapped (flag = 4), though it actually could be mapped (sometimes even without errors). Moreover, if we have the same molecule present multiple times in fastq file, some of the reads could be mapped, while other (exactly same) reads will be unmapped – this is a random process. The only solution I invented to tackle this problem is to add poly-N sequences to the ends of each sequence in fasta file.

I understand that this bug is very-very unlikely in the most of real-world setups. However, I massively experienced this while trying to align short reads to Y RNA sequences: there are lot of repeats, and a lot of reads mapped to the Y RNA molecule junctions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimapping and reference sequence junctions #379

Multimapping and reference sequence junctions #379

s-a-nersisyan commented Nov 1, 2022

Multimapping and reference sequence junctions #379

Multimapping and reference sequence junctions #379

Comments

s-a-nersisyan commented Nov 1, 2022