Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add processing of sample names consisting only of numbers #165

Open
ValentineObrezanenko opened this issue Dec 18, 2024 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@ValentineObrezanenko
Copy link

Description of feature

Hi.
I faced the problem that if I have only numbers written to the sample ID, then in some scripts she starts counting them as numbers. And there is also a comparison of a number with a string, which causes an error. I think it's worth paying attention to this. For example, the run_drimseq_filter script.R and suppa_split_file. You can take this into account in the line samps$sample_id < - as.character(samps$sample_id).

@ValentineObrezanenko ValentineObrezanenko added the enhancement New feature or request label Dec 18, 2024
@gianfilippo
Copy link

Hi,

similar issue here. My samples IDs start with a number and end with a character. The run_drimseq_filter.R fails with the following error
Error in DRIMSeq::dmDSdata(counts = counts, samples = samps) :
all(samples$sample_id %in% colnames(counts)) is not TRUE
Calls: -> stopifnot
Execution halted

This is because after the salmon count file is read, a count data.frame is created
counts <- data.frame(gene_id = tx2gene$gene_id, feature_id = tx2gene$tx,cts)

and at this point the sample IDs are not handled properly and. an "X" is prepended, eventually resulting in a mismatch

I added "check.names = F" to fix this, but I am not sure this is the best way.

Thanks

@gianfilippo
Copy link

UPDATE
after the previous correction, a similar error
Error: all(samples$sample %in% colnames(counts)) is not TRUE
is called.

This is originally caused by the following call
d <- DRIMSeq::dmDSdata(counts = counts, samples = samps)

again, this is because the DRIMSeq::dmDSdata function does not handle colnames starting with numbers and prepends an "X". This results in a count matrix saved with colnames not matching sample names.

At the moment I added the following line
colnames(d.counts) = colnames(counts)

after this bit of code

Take count data

d.counts <- counts(d)

This should be ok, since the two DRIMSeq calls, dmDSdata and dmFilter, do not filter out samples, only features, as far as I understand.

The job completed successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants