Add processing of sample names consisting only of numbers #165

ValentineObrezanenko · 2024-12-18T08:33:33Z

Description of feature

Hi.
I faced the problem that if I have only numbers written to the sample ID, then in some scripts she starts counting them as numbers. And there is also a comparison of a number with a string, which causes an error. I think it's worth paying attention to this. For example, the run_drimseq_filter script.R and suppa_split_file. You can take this into account in the line samps$sample_id < - as.character(samps$sample_id).

gianfilippo · 2025-01-04T05:32:04Z

Hi,

similar issue here. My samples IDs start with a number and end with a character. The run_drimseq_filter.R fails with the following error
Error in DRIMSeq::dmDSdata(counts = counts, samples = samps) :
all(samples$sample_id %in% colnames(counts)) is not TRUE
Calls: -> stopifnot
Execution halted

This is because after the salmon count file is read, a count data.frame is created
counts <- data.frame(gene_id = tx2gene$gene_id, feature_id = tx2gene$tx,cts)

and at this point the sample IDs are not handled properly and. an "X" is prepended, eventually resulting in a mismatch

I added "check.names = F" to fix this, but I am not sure this is the best way.

Thanks

gianfilippo · 2025-01-04T17:25:16Z

UPDATE
after the previous correction, a similar error
Error: all(samples$sample %in% colnames(counts)) is not TRUE
is called.

This is originally caused by the following call
d <- DRIMSeq::dmDSdata(counts = counts, samples = samps)

again, this is because the DRIMSeq::dmDSdata function does not handle colnames starting with numbers and prepends an "X". This results in a count matrix saved with colnames not matching sample names.

At the moment I added the following line
colnames(d.counts) = colnames(counts)

after this bit of code

Take count data

d.counts <- counts(d)

This should be ok, since the two DRIMSeq calls, dmDSdata and dmFilter, do not filter out samples, only features, as far as I understand.

The job completed successfully.

ValentineObrezanenko added the enhancement New feature or request label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add processing of sample names consisting only of numbers #165

Add processing of sample names consisting only of numbers #165

ValentineObrezanenko commented Dec 18, 2024

gianfilippo commented Jan 4, 2025

gianfilippo commented Jan 4, 2025

Add processing of sample names consisting only of numbers #165

Add processing of sample names consisting only of numbers #165

Comments

ValentineObrezanenko commented Dec 18, 2024

Description of feature

gianfilippo commented Jan 4, 2025

gianfilippo commented Jan 4, 2025

Take count data