{sample}.fastq.gz have incorrect sequence identifier string #6

Nicolas-Fernandez · 2021-05-03T12:32:31Z

Dear Fedonin,

I run VirGenA with option "assembling using reference, without msa" on some reads cleaned with alignment on a reference genome (Bowtie2+Samtools). Some tools are ok with my fastq obtnained, like fastqc, fastqscreen, DNAstar, but with VirGenA I have this issue :

java.io.IOException: File {sample}.fastq.gz have incorrect sequence identifier string

Somes parameters :

_> Mode:
Reference Selector: false
Use Major: true

Data:
Reads Insertion Length: 1000 nt

Computing:
Thread Number: -1 threads
Batch Size: 1000 reads

Assembling:
Reference: {my_ref}.fasta
MSA: {my_msa}.fasta
Minimum Read Length: 50 nt
Uclust Identity (%): 0.95
Minimum Contig Length: 1000 nt
Delta (%): 0.05_

My fastq format (head) before and after cleaning :

BEFORE

@FS10001377:5:BPA73114-2327:1:1101:1140:1000 1:N:0:4
AACATTGGCCGTGACAGCTTGACAAATGTTAAAAACACTATTAGCATA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@FS10001377:5:BPA73114-2327:1:1101:1360:1000 1:N:0:4
GCACATCACTACGCAACTTTAGAGCACATCACTACGCAACTTTAGAC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@FS10001377:5:BPA73114-2327:1:1101:2240:1000 1:N:0:4
GCTTATTGTTGGCGTTGCACTTCTTGCTGTTTTTCAG

AFTER

@FS10001377:5:BPA73114-2327:1:1101:1000:1260
GAGTTTAGTTCCCTTCCATCATATGCAGCTTTTGCTACTGTTCAAGAAGCTTATGAGCAGGCTGTTGCTAATGGTGATTCTGAAGTTGTTCTTAAAAAGTTGAAGAAGTCTTTGAA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@FS10001377:5:BPA73114-2327:1:1101:1000:1530
CTGCTTGCACTGATGACAATGCTTTAGCTTACTACAACACAACAAAGGGAGGTAGGTTTGTACTTTCACTGTTATCCGATTTACAGGATTTGAAATGGGCTAGATTCCCTAAGAGTGATGGAACTGGTACTATC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@FS10001377:5:BPA73114-2327:1:1101:1000:2010
GCCATTGTGTATTTAGTAAGACGTTGACGTGATATATGTGGTACCATGTCACCGTCTATTCTAAACTTAAAGAAGTCATGTTTAGCAACAGCTGGACAATCCTTAAGTAAATTATAAATTGTTTCTTCATGTTGGTAG

It's the last missing part of the header missing (1:N:0:4) ?
Or maybe something else ?

Thank you very much,
Nicolas

The text was updated successfully, but these errors were encountered:

Nicolas-Fernandez · 2021-05-03T14:56:37Z

Update: No more issue using option : samtools fastq -N (Always add either '/1' or '/2' to the end of read names, even when put into different files), with now information about who is R1 and R2 inside file.fastq.gz, I guess. :)

gFedonin · 2021-05-03T15:05:09Z

You are right! VirGenaA searches for the first ' ' or '/' and then trims the ending to match read names in R1 and R2 files. Bowtie + samtools seem to do the same, trimming the ending. Adding '/1' or '/2' or '1:N:0:4' back fixes this as you already found yourself.

gFedonin closed this as completed May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{sample}.fastq.gz have incorrect sequence identifier string #6

{sample}.fastq.gz have incorrect sequence identifier string #6

Nicolas-Fernandez commented May 3, 2021

Nicolas-Fernandez commented May 3, 2021

gFedonin commented May 3, 2021

{sample}.fastq.gz have incorrect sequence identifier string #6

{sample}.fastq.gz have incorrect sequence identifier string #6

Comments

Nicolas-Fernandez commented May 3, 2021

Nicolas-Fernandez commented May 3, 2021

gFedonin commented May 3, 2021