Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not find the Binning_refiner.stats as mentioned in Usage_tutorial.md #24

Open
Thexiyang opened this issue Apr 24, 2018 · 15 comments
Open

Comments

@Thexiyang
Copy link

Thexiyang commented Apr 24, 2018

Hi, I am checking the data from Bin_refinement module. But I did not find the Binning_refiner.stats, Binning_refiner as mentioned in Usage_tutorial.md. But others are all there. And there are two empty bins in the metaWRAP_bins, which I think should be removed. let me know if this can be an issue.

Thanks,

@ursky
Copy link
Collaborator

ursky commented Apr 25, 2018

I actually took out the Binning_refiner bins from the final plot because I thought thout it was confising since the module is also called Bin_refinement. I should probably remove that from the tutorial... If you are curious about what that would look like, the other figure has binsABC, which is actually the same as Binning_refiner. It is the result of running Binning_refiner on all three inputs.

And with the two empty bins, you mean they are .fa files with a size 0 bytes? Is there anything there?

@Thexiyang
Copy link
Author

Thanks for the explanation. Now I got it. I suggest to remove it from the tutorial, as it might confuse the beginners like me.

And yes, it is 0 bytes size. But others are fine. So I have in total 264 good bins plus 2 bins with 0 size. Just did not understand why they are there. I need to mention that metaWRAP significantly improved the bin quality.

Just another question. I have two bins with the highest abundance based on the module Quant_bins. But their completeness are the lowest ones (only 50%). I define good bins as -c 50 -x 10. What could be reason for this? I imagine they should have good completeness due to their high abundance.

@ursky
Copy link
Collaborator

ursky commented Apr 25, 2018

Can you check if those two bins are in the metaWRAP.stats file?

@Thexiyang
Copy link
Author

There are not there. Checkm just ignored them.

@ursky
Copy link
Collaborator

ursky commented Apr 25, 2018

One more thing, are they in the binsO folder in the work directory?

@Thexiyang
Copy link
Author

they are in binsO

@ursky
Copy link
Collaborator

ursky commented Apr 25, 2018

But they are empty there too, right?

@Thexiyang
Copy link
Author

yes, the same. all 0 size

@ursky
Copy link
Collaborator

ursky commented Apr 25, 2018

And I'm guessing they are also in binsM, but are not empty?

@Thexiyang
Copy link
Author

Thexiyang commented Apr 25, 2018

sorry misunderstood your questions. yes, you are right!

1 similar comment
@Thexiyang
Copy link
Author

sorry misunderstood your questions. yes, you are right!

@ursky
Copy link
Collaborator

ursky commented Apr 25, 2018

I found the issue. It looks like the de-replication stage of the bin consolidation resulted in two bins that have no contigs at all. This is an artifact resulting from your low min completion parameter. Basically, ignore them! Everything is good.

For future users, I put a patch into metaWRAP v=0.8.4 that fixes this. It will come out in the next couple weeks.

Thanks for your feedback!

As for your other question about high-abundance bins with poor completion metrics, this is unfortunately very common. I see it in my data all the time. The reason for this is that these high-abundance species also often have high strain heterogeneity. This confuses both the assembler, and the function that estimates contig coverage, resulting in poor bins. If you really care about those organisms, you can try to assemble and bin single samples individually (or in small groups) in hopes that this reduces the coverage and heterogeneity to the point where you can assemble and bin them better.

@Thexiyang
Copy link
Author

Thanks!

What about the reassemble? My last try on reassemble module did not work out as it got stuck on one bin for almost 12 hours. Would it be possible to improve the completeness of these target bins by reassemble? I am thinking should I give it another try?

@ursky
Copy link
Collaborator

ursky commented Apr 25, 2018

Bin reassembly will most likely moderately increase the bin completion and significantly reduce bin contamination. It won't increase the completion that much. Have a look at the reassembly benchmarks in the publication.

And yeah, the reassembly can be very slow for bins that have a very high number of reads mapping to them. The module runs on all the bins in parallel (limited by your thread count of course), but with 1 thread per bin, which is why its so slow for those very high abundance ones. Its speeds things up for most users, but not all...

I actually just released metaWRAP v=0.8.4, which has a new parallelization option. Now you can chose to run without the parallelization feature, which means the bins will be reassembled one by one, but using all the threads available. This will help you overcome your issue with that one bin!

@Thexiyang
Copy link
Author

Thexiyang commented Apr 25, 2018

thanks. I will update it to the new version and rerun reassembly.

@ursky ursky reopened this Sep 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants