Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: recursive process #1563

Open
lindenb opened this issue Apr 10, 2020 · 9 comments
Open

Feature Request: recursive process #1563

lindenb opened this issue Apr 10, 2020 · 9 comments

Comments

@lindenb
Copy link
Contributor

lindenb commented Apr 10, 2020

New feature

as seen on twitter https://twitter.com/yokofakun/status/1248533372630155264 I'd like to have a way to describe a recursive process that would call itself until a condition is done.

I have no idea how it could be described in a DSL:

Usage scenario

  • I have a large list of VCF that I want to merge in a given region. The number of VCF is too large
    for bcftools merge or bcftools concat : it would take days and memory to load the indexes and merge the variants. The idea is to merge by divide an conquer the list of VCFs

  • same example for BAM

  • I want to find the rares variant in my family but absent from 10000 bams:

    • call family-sample1 vs call family-sample2, keep the common variant
    • call family-sample3 only for the variant of previous step, keep the commont variant
    • call family-sample4 only for the variant of previous step, keep the commont variant
    • call family-sample5 only for the variant of previous step, keep the commont variant
    • call control-bams1 remove the variants
    • call control-bams2 remove the variants
    • (...)
    • call control-bamsN remove the variants
      at the end we have the are variants
  • GATK CombineGVCF. 10,000 gvcfs to be combined:

    • Combine 100 Gvcf * 100
    • Combine 100 Gvcf
    • Genotype the last GVCFf

Suggest implementation

it's not clear to me how you could implement this idea :-)

recursive process  mergeVcf {
recursionPoolSize 10
stopRecursionWhenPoolSize 1
input:
      val vcfs from vcf_list.andThen(self.merged.collect())
output:
      file("merged${task.recursionLevel}.bcf") into merged      
script:
"""
bcftools merge -O b -o merged${task.recursionLevel}.bcf ${vcfs.join(" ")}
"""
}
@lindenb
Copy link
Contributor Author

lindenb commented Apr 10, 2020

see also http://plindenbaum.blogspot.com/2014/12/divide-and-conquer-in-makefile.html (2014)

Divide-and-conquer in a #Makefile : recursivity and #parallelism.

https://pbs.twimg.com/media/B4GoInWIQAAuwp1.png:large

@pditommaso
Copy link
Member

pditommaso commented Apr 10, 2020

Interesting. I think it could be done using the feedback pattern

@lindenb
Copy link
Contributor Author

lindenb commented Apr 10, 2020

@pditommaso ohh, that's new to me !

@lindenb
Copy link
Contributor Author

lindenb commented Apr 10, 2020

unless I'm wrong your solution would be sequential only ?

In the example you provided, the lines are added one after the other for each process . So, each step is not parallelizeable compared to a divide and conquer strategy (?)

Furthermore, in the scope of merging some BAM files, the main bam would become bigger and bigger and hence, the process slower and slower...

Am I wrong ?

@pditommaso
Copy link
Member

Umm, thinking more the collect makes things more complicated and like the loop above does not work because the collect would force to wait for the overall completion.

@stale
Copy link

stale bot commented Sep 23, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 23, 2020
@stale stale bot closed this as completed Nov 22, 2020
@pditommaso pditommaso reopened this Nov 23, 2020
@stale stale bot removed the stale label Nov 23, 2020
@stale
Copy link

stale bot commented Apr 22, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 22, 2021
@stale stale bot closed this as completed Jun 21, 2021
@pditommaso pditommaso reopened this Jun 21, 2021
@stale stale bot removed the stale label Jun 21, 2021
@stale stale bot removed the stale label Jun 21, 2021
@pditommaso
Copy link
Member

Not sure it's the same feature it's envisioned in this issue, however, the support for recursion has been added recently #2521

@lindenb
Copy link
Contributor Author

lindenb commented Dec 24, 2021

@pditommaso thank you , I'll have a look at this new feature !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants