Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renaming "deblur final table"? #58

Open
fedarko opened this issue Sep 25, 2019 · 2 comments
Open

Renaming "deblur final table"? #58

fedarko opened this issue Sep 25, 2019 · 2 comments

Comments

@fedarko
Copy link

fedarko commented Sep 25, 2019

When I first started downloading data from Qiita, it seemed to me like deblur final table (all.biom) was the table I should be using as a starting point, but from doing some digging it looks like deblur reference hit table (reference-hit.biom) is the recommended table for use in typical 16S analyses. No big deal, I can rerun my analysis with reference-hit.biom instead of all.biom :)

I know there are existing docs explaining the differences between these BIOMs (see references at the bottom for some of what I've found), but these are all external resources ([2] and [3] are linked from the "Help" dropdown in Qiita, but you have to dig a bit to find the info on Deblur). In my opinion, the actual Qiita user interface doesn't explain this super well. Furthermore, I think other people have had the same confusion I've had and have used all.biom in 16S studies; see the full thread of [5]. I can also see that this issue has been brought up before in #16, but it doesn't seem like that issue has been resolved.

I believe it might be worthwhile to do some or all of the following:

  1. Rename deblur final table to something like deblur reference hit and reference non-hit table, or deblur non-positive-filtering table, or deblur all.biom ("final") table, or something like that.
    • In any case, I think that labelling this as the "final" output of deblur when it isn't actually what most users will want to use in their analyses is unclear.
  2. Add a sentence or two giving some context—and/or links to some of the references below—in the artifact details for deblur outputs (e.g. for reference-hit.biom, This deblur artifact was positive-filtered against a reference database of 16S sequences in an attempt to remove non-16S sequences. We recommend using it for most 16S analyses.)
    • It looks like Remove unfiltered output from deblur? #16's idea was to "add warnings", which I'd imagine being something like This deblur artifact was not positive-filtered. We recommend not using it for normal 16S analyses, but it may be useful for other marker-gene studies. for all.biom.

I am happy to discuss further/help out as needed—I think this will help people choose the correct outputs for their analyses, and alleviate confusion in general.

[1] https://github.com/biocore/deblur#input-and-output-files
[2] https://qiita.ucsd.edu/static/doc/html/processingdata/index.html#deblurring (doesn't go into a lot of detail)
[3] https://cmi-workshop.readthedocs.io/en/latest/qiita-16S-processing.html#the-deblur-workflow
[4] https://forum.qiime2.org/t/transferring-qiita-artifacts-to-qiime2/4790
[5] https://forum.qiime2.org/t/deblur-without-16s-filter/3968/

@antgonza
Copy link
Member

antgonza commented Oct 8, 2019

Thank you @fedarko.

I think deblur reference-hit filtered and deblur without filtering could be good names, what do you think?

BTW changing the names within the plugin will change the name of the output artifacts; like the ones displayed here:
Screen Shot 2019-10-03 at 8 58 06 AM

However, they will not change the names of the ones generated/merged for Analysis, currently looks like this:
Screen Shot 2019-10-08 at 11 16 25 AM
Note that to change that we will need to modify the main qiita code vs. this plugin.

@fedarko
Copy link
Author

fedarko commented Oct 8, 2019

I like the suggested names, but I think they're somewhat inaccurate: both of these artifacts still have had negative filtering (e.g. of PhiX / adapter sequences) applied, right? So in a sense both of these artifacts have had "filtering" done.

Maybe something like deblur positive-and-negative-filtered and deblur only-negative-filtered would convey the same sort of message while being more accurate.

Re: the multiple dflt_name artifacts, I think renaming those would also be a good idea (I know one of them has had the insertion tree filter applied, but being explicit about this in the graph would be much clearer for users IMO). I know this sort of concern has come up before on the main Qiita repo, but since this problem still remains for analyses I believe it would be worth fixing there. (Can write up an issue for this in biocore/qiita if you want.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants