Fix read length plotting in aggregate mode #928

fanli-gcb · 2020-01-27T20:03:50Z

Small fix to the calculation of read counts that pass each length when being plotted in aggregate mode. Cum=10*as.vector(cums)/rc references the read count from the last sample processed, should be the sum of the actual read counts used in the tabulation.

benjjneb · 2020-01-29T00:06:37Z

Can you provide a minimal example that demonstrates the bug in the current behavior? My quick inspection of results w/ 3 random fastqs of different lengths in normal and aggregated mode turned out as I expected.

fanli-gcb · 2020-01-29T05:49:25Z

Sure, see the attached example of 3 fastq files with 10k, 10k, and 100 reads.
fastqs.zip

Then:

fnFs <- c("S1.fastq.gz", "S2.fastq.gz", "small.fastq.gz")
plotQualityProfile(fnFs, aggregate=T)

will yield the following:

The key is that the last element of fnFs is substantially smaller than the others, as the read count from the last file to be processed gets used in a cumulative calculation.

benjjneb · 2020-01-30T20:28:14Z

Ah OK I see it now, effectively the last rc was being multiplied by the number of files, which isn't the correct way to handle this. rc*nrow(anndf) doesn't have to be equal to the total records across all the files.

benjjneb · 2020-01-30T20:28:28Z

LGTM, thanks!

Fix read length plotting in aggregate mode

feb9e32

benjjneb merged commit a9d25c7 into benjjneb:master Jan 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix read length plotting in aggregate mode #928

Fix read length plotting in aggregate mode #928

fanli-gcb commented Jan 27, 2020

benjjneb commented Jan 29, 2020

fanli-gcb commented Jan 29, 2020

benjjneb commented Jan 30, 2020

benjjneb commented Jan 30, 2020

Fix read length plotting in aggregate mode #928

Fix read length plotting in aggregate mode #928

Conversation

fanli-gcb commented Jan 27, 2020

benjjneb commented Jan 29, 2020

fanli-gcb commented Jan 29, 2020

benjjneb commented Jan 30, 2020

benjjneb commented Jan 30, 2020