Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collapse collection after a pick value returns expression.json file #19245

Open
Delphine-L opened this issue Dec 3, 2024 · 6 comments
Open
Assignees

Comments

@Delphine-L
Copy link
Contributor

Describe the bug
I have a "collapse collection" step in a workflow that runs on a collection of fastqsanger.gz files. It follows a "Pick value" tool that seems to return a proper collection, but then the result is an expression.json file (see screenshot). If I run it in a simple workflow with just the collapse, I get the expected fastqsanger.gz file, same if the "pick value" has an existing collection as the first option. It looks like despite showing as a collection the result of the pick value picked the empty collection ? Am I not supposed to use pick value with collections?

Galaxy Version and/or server at which you observed the bug
Galaxy Version: main
Commit: (run git rev-parse HEAD if you run this Galaxy server)

Browser and Operating System
Operating System: macOS
Browser: Chrome,

To Reproduce
Steps to reproduce the behavior:

  1. Import history : https://usegalaxy.org/u/delphinel/h/test-collapse
  2. Import workflow : https://usegalaxy.org/u/delphinel/w/test-collapse
  3. Run the workflow with the following parameters :
    • Hi-C Forward reads : forward
    • Hi-C Reverse reads : reverse
    • Trim Hi-C reads? : yes (it works fine if I select no here)
  4. See that the final results of the collapses are expression.json files

Expected behavior
the outputs of the pick value tools should be the sorted collections if cutadapt is skipped, and running the collapse tool on them should return a fastqsanger.gz file.

Screenshots

image

@Delphine-L
Copy link
Contributor Author

Oh, also, in a bigger workflow, the step downstream are absolutely fine with these expression.json as inputs (i.e. Hifiasm that uses fastq format). Is it just a display issue?

image

@martenson
Copy link
Member

How does the dataset that is incorrectly identified as expression.json look inside? Is it possibly a valid result that you expected with just bad datatype attached?

@Delphine-L
Copy link
Contributor Author

yes the dataset looks correct inside. Actually even in Galaxy when clicking on the eye the preview is correct so it knows to uncompress it ?

@mvdbeek mvdbeek self-assigned this Dec 10, 2024
@Delphine-L
Copy link
Contributor Author

Delphine-L commented Dec 13, 2024

Note: Causes a "metadata generation failed" error when the Collapse job runs on jetstream2 (when ran from vgp.usegalay.org). https://usegalaxy.org/datasets/f9cad7b01a4721352a60240adfe1fef1/details

@natefoo
Copy link
Member

natefoo commented Dec 13, 2024

It's probably not the underlying cause of the datatype issue, but it's interesting that set_meta() in Galaxy seems not to choke on the fastqsanger.gz not being json, whereas in Pulsar it does:

Traceback (most recent call last):
  File "/cvmfs/main.galaxyproject.org/galaxy/lib/galaxy/metadata/set_metadata.py", line 523, in set_metadata_portable
    set_meta(dataset, file_dict)
  File "/cvmfs/main.galaxyproject.org/galaxy/lib/galaxy/metadata/set_metadata.py", line 205, in set_meta
    set_meta_with_tool_provided(
  File "/cvmfs/main.galaxyproject.org/galaxy/lib/galaxy/metadata/set_metadata.py", line 141, in set_meta_with_tool_provided
    dataset_instance.datatype.set_meta(dataset_instance, **set_meta_kwds)
  File "/cvmfs/main.galaxyproject.org/galaxy/lib/galaxy/datatypes/text.py", line 164, in set_meta
    obj = json.load(f)
          ^^^^^^^^^^^^
  File "/cvmfs/main.galaxyproject.org/deps/_conda/envs/[email protected]/lib/python3.11/json/__init__.py", line 293, in load
    return loads(fp.read(),
                 ^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

@mvdbeek
Copy link
Member

mvdbeek commented Dec 20, 2024

So this is a little tricky, the nested pick_param_value tool picks the input extension before the input pick_param_value tool is done, because it is extension.json until the input job is actually done. This is fine until the collapse collection tool also uses format_source and sets the output to extension.json. The cleanest solution here is to set the output datatype for the pick_param tool in the workflow editor to fastqsanger.gz, but of course that means you can't use fastqsanger as input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants