Workflow6: Workflow4 + Workflow5 + demo_mv #21

r06942072 · 2018-11-13T22:09:05Z

Link:
https://github.com/NAL-i5K/CWL_Common-Workflow-Language/tree/dev/demo_workflow6

wget
checksums (Workflow5)
gunzip (Workflow4)
tree
mv

hsiaoyi0504 · 2018-11-22T05:13:46Z

This comment is related to what I commented earlier in #20.

Current usage of pipeline looks like

cwl-runner 1st-workflow.cwl 1st-workflow-job.yml
cwl-runner block_wget.cwl 1st-workflow-job.yml
cwl-runner block_gunzip.cwl 1st-workflow-job.yml
cwl-runner block_gitclone.cwl 1st-workflow-job.yml
cwl-runner block_tree.cwl 1st-workflow-job.yml
cwl-runner block_mv.cwl 1st-workflow-job.yml

Although it's easy to have another script that collects these command together, the usage of pipeline looks similar what we don't even use cwl.

... do something
wget ...
gunzip ...
git clone ...
tree ...
mv ...

Then, why we need to use CWL ?

The potential reason as what I can see (maybe I am wrong) is that the coverage of functionality of a block we define is too small. That makes no difference between this and we actually run commands one by one. In my imagination, a block in a pipeline should accumulate 5~10 commands together. For example, wget, gunzip should put in a same block together with creating the initial file directory.

r06942072 · 2018-11-26T17:43:42Z

The ultimate goal of CWL : Done all the work in only one command
All the code named as demo_workflowX in CWL repo, is executed in only one single command, like below

cwl-runner 1st-workflow.cwl 1st-workflow-job.yml

We provide two arguments to the cwl-runner command
The first argument: a cwl file with Workflow class, which specify all the external input and the steps.
The second argument: a yaml file to specify input.
Lego brick(Building block) and Lego house(Workflow)
Workflow is basically combination of Building Block
I imagine that every building block in cwl is like a lego brick.
We use bunch of lego bricks to build up a lego house.
I prefer the small piece of lego brick, because it is able to build a delicate lego house.
The benefit of using small piece is that it can provide Flexibility to achieve whatever kind of house we want to build.
Why my design rule is wrapping only single command in one building block?
So far, every building block is CommandLineTool class, which only include one linux command.
It is much easier to develope and for future long-term maintenance.
We could reuse the building block and put it into the corresponding step in any workflow.
One benefit of using CWL
There is a online tool called CWL-viewer, easy to understand and demo the idea
CWL-viewer link: https://view.commonwl.org/

hsiaoyi0504 · 2018-11-27T14:58:52Z

I still don't get it why we need to create a wrapper for only one linux command.

I agree that it is easy to develop but in terms of long-term maintenance, if we create a wrapper for only one command, that means another layer of complexity and another possible source of bug (compared with directly using the original linux command). Does it really benefit the maintainability, or it hurts the maintainability ?

I prefer the small piece of lego brick, because it is able to build a delicate lego house.
The benefit of using small piece is that it can provide flexibility to achieve whatever kind of house we want to build.

I totally agree that. However, it seems to me that wrapping a single linux command doesn't always boost the flexibility. For ungzip case, it does, because it let us have flexibility to determine where the file should be placed , but I don't see such benefit for mv, cp, wget, cp, and tree.

r06942072 · 2018-11-27T15:35:44Z

This is a worth discussing issue:
What is the basic unit should look like in cwl ?

For my design, the reason why there is only one linux command in one block is simple.
The original intention of CommandLineTool class in cwl seems to use one 'basecommand' field to achieve tools isolated.

Helpful Link:
https://www.biostars.org/p/229095/

hsiaoyi0504 · 2018-11-27T18:04:13Z

What is the basic unit should look like in cwl ?

It's a great question, but I don't think there is a perfect answer for that.

Although I don't think there is a common answer, in our case, I do have an opinion on it based on projects that are similar to what we are doing (see below links). In my opinion of this project, each block should have a somewhat high-level meaning, rather than how it's implemented.

My observation comes from other existing projects:

Basic units are like prep_align_input, process_align, and postprocess_align.
It's similar to how we prepare a block diagram in a paper. Will you put move (mv) or copy (copy) one file to one directory as a step in your paper's block diagram ?

Probably, we can make use of what we already have. Based on our internal wiki, organism on-boarding have been divided into several steps and it seems to me that each step would not require more than 5 blocks. For example, the first step, set up data directories and get data can be divided into at least two blocks (set_up_data_directories and get_data), but we probably don't want to implement a block like wget one of data files we require in the pipeline.

Another thought is that each block should be unit-testable and worth to be tested (https://github.com/ncbi/pgap/tree/master/progs/unit_tests).

r06942072 · 2018-11-29T22:20:29Z

I think I got your point.
I agree that "At last, we should have high-level meaning on each single block, rather than how it's implemented"
But first I would firstly focus on the CWL could really function and help the automatic organism onboarding pipeline.
Once the workflow is function, the next step is to wrap them into nicer unit, and I believe there is a way to do it.
For example by SubworkflowFeatureRequirement provided in cwl, which is making a workflow of workflow.
We can connect more blocks into a workflow and declare it as an unit when we demo the pipeline

r06942072 self-assigned this Nov 13, 2018

r06942072 changed the title ~~Workflow6: three things~~ Workflow6: gitclone, wget, tree, mv Nov 14, 2018

r06942072 changed the title ~~Workflow6: gitclone, wget, tree, mv~~ Workflow6: Workflow4 + Workflow5 + demo_mv Nov 16, 2018

r06942072 closed this as completed Nov 26, 2018

r06942072 reopened this Nov 26, 2018

r06942072 mentioned this issue Nov 26, 2018

Building Block: block_mv #20

Closed

r06942072 closed this as completed Feb 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow6: Workflow4 + Workflow5 + demo_mv #21

Workflow6: Workflow4 + Workflow5 + demo_mv #21

r06942072 commented Nov 13, 2018 •

edited

Loading

hsiaoyi0504 commented Nov 22, 2018

r06942072 commented Nov 26, 2018 •

edited by hsiaoyi0504

Loading

hsiaoyi0504 commented Nov 27, 2018 •

edited

Loading

r06942072 commented Nov 27, 2018 •

edited

Loading

hsiaoyi0504 commented Nov 27, 2018 •

edited

Loading

r06942072 commented Nov 29, 2018

Workflow6: Workflow4 + Workflow5 + demo_mv #21

Workflow6: Workflow4 + Workflow5 + demo_mv #21

Comments

r06942072 commented Nov 13, 2018 • edited Loading

hsiaoyi0504 commented Nov 22, 2018

r06942072 commented Nov 26, 2018 • edited by hsiaoyi0504 Loading

hsiaoyi0504 commented Nov 27, 2018 • edited Loading

r06942072 commented Nov 27, 2018 • edited Loading

hsiaoyi0504 commented Nov 27, 2018 • edited Loading

r06942072 commented Nov 29, 2018

r06942072 commented Nov 13, 2018 •

edited

Loading

r06942072 commented Nov 26, 2018 •

edited by hsiaoyi0504

Loading

hsiaoyi0504 commented Nov 27, 2018 •

edited

Loading

r06942072 commented Nov 27, 2018 •

edited

Loading

hsiaoyi0504 commented Nov 27, 2018 •

edited

Loading