Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion #37

Open
Potatoconomy opened this issue Jan 10, 2021 · 3 comments
Open

Suggestion #37

Potatoconomy opened this issue Jan 10, 2021 · 3 comments

Comments

@Potatoconomy
Copy link

Hey! Thanks for the great program.

There were two things that would, in my eyes, really round out the utility of this tool.

  1. Removal of PolyA/T tails. Only a subset of my reads still contain an A/T tail, and headclipping to remove this bias is also clipping the reads that have already had this section removed.

  2. Nucleotide tail clipping on a subset of reads. Right now, it tailclips on all reads, however my fastqc report shows that I should only be tailclipping the longest reads in my fastq file.

Once again, thanks for the program!

@wdecoster
Copy link
Owner

Hi,

Thanks for the suggestions! I'll give them some thought, but have a question for each:

  1. Do you suggest to remove 'exact' polyA/T tails (with only AAAAAAA or only TTTTTTTTTTTTTT) or (I assume the latter) rather also allow some noise in those stretches?
  2. How do you think this should be implemented? Like having an option to --clip-when-length 10000 that the user can specify for which read length the clipping rules do apply?

Cheers,
Wouter

@Potatoconomy
Copy link
Author

Hey,

I am working with nanopore reads which have around a 15% error call with each nucleotide. For this reason, the noise would have to be accounted for, likely with a sliding window technique. Prinseq is a program that removes the exact polyA/T tails, which I use, but this still leaves me with a +10% (T) bias for the beginning of my reads. and a slight A bias at the end.

Right now, my pipeline is to do some trimming with NanoFilt and then follow that up with the A/T trimming with Prinseq. This has given me the least nucleotide bias so far, although there is still some present.

For the 2nd suggestion, I had actually misinterpreted my FastQC report and forgot that there were fewer reads with longer lengths, hence increasing the variance of my data in that region.

Thanks,
Patrick

@wdecoster
Copy link
Owner

So that leaves us only with suggestion 1? Okay, I'll think about it how to best implement this.

@wdecoster wdecoster transferred this issue from wdecoster/nanofilt Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants