You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've experienced an issue which could be IMHO treated as a bug. As documented, bwa concatenates all sequences from the reference fasta file, and some reads could map to the junction of two reference sequences – in this case the read is marked as unmapped (SAM flag = 4), though it still reports the position.
The problem arises with multiple mappers. As documented, bwa randomly selects one of the multiple alignments as the “primary” one, leaving the others in XA tag. The trick is that sometimes the “primary” random chosen position is exactly the undesired junction, and at the same time some good alignments are marked as secondary and stored in XA. In this case we think that the read is unmapped (flag = 4), though it actually could be mapped (sometimes even without errors). Moreover, if we have the same molecule present multiple times in fastq file, some of the reads could be mapped, while other (exactly same) reads will be unmapped – this is a random process. The only solution I invented to tackle this problem is to add poly-N sequences to the ends of each sequence in fasta file.
I understand that this bug is very-very unlikely in the most of real-world setups. However, I massively experienced this while trying to align short reads to Y RNA sequences: there are lot of repeats, and a lot of reads mapped to the Y RNA molecule junctions.
The text was updated successfully, but these errors were encountered:
Hello,
I've experienced an issue which could be IMHO treated as a bug. As documented, bwa concatenates all sequences from the reference fasta file, and some reads could map to the junction of two reference sequences – in this case the read is marked as unmapped (SAM flag = 4), though it still reports the position.
The problem arises with multiple mappers. As documented, bwa randomly selects one of the multiple alignments as the “primary” one, leaving the others in XA tag. The trick is that sometimes the “primary” random chosen position is exactly the undesired junction, and at the same time some good alignments are marked as secondary and stored in XA. In this case we think that the read is unmapped (flag = 4), though it actually could be mapped (sometimes even without errors). Moreover, if we have the same molecule present multiple times in fastq file, some of the reads could be mapped, while other (exactly same) reads will be unmapped – this is a random process. The only solution I invented to tackle this problem is to add poly-N sequences to the ends of each sequence in fasta file.
I understand that this bug is very-very unlikely in the most of real-world setups. However, I massively experienced this while trying to align short reads to Y RNA sequences: there are lot of repeats, and a lot of reads mapped to the Y RNA molecule junctions.
The text was updated successfully, but these errors were encountered: