Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix workflow links and update wrapper prefix #7

Merged
merged 25 commits into from
Jan 5, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Linting
  • Loading branch information
percyfal committed Jan 5, 2024
commit 27803f124751b02dd2deb23f972ca7079eb8dc57
26 changes: 12 additions & 14 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,15 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Lint workflow
#uses: ezherman/snakemake-github-action@5027c0e706ada924ab91e0501bd92185fc98de3c
uses: snakemake/[email protected]
with:
directory: .
snakefile: workflow/Snakefile
args: "--lint"
stagein:
micromamba install -y -c conda-forge rsync
ln -s /usr/bin/micromamba /usr/bin/mamba
ln -s /usr/bin/micromamba /usr/bin/conda
micromamba install -y -c conda-forge rsync;
ln -s /usr/bin/micromamba /usr/bin/mamba;
ln -s /usr/bin/micromamba /usr/bin/conda;
Testing:
runs-on: ubuntu-latest
needs: Linting
Expand All @@ -31,23 +30,22 @@ jobs:
- name: Checkout repository and submodules
uses: actions/checkout@v4
- name: Test workflow (local test data)
#uses: ezherman/snakemake-github-action@5027c0e706ada924ab91e0501bd92185fc98de3c
uses: snakemake/[email protected]
with:
directory: .test
snakefile: workflow/Snakefile
args: "--use-conda --conda-frontend mamba --show-failed-logs -j 10 --conda-cleanup-pkgs cache --wrapper-prefix file:///github/workspace/workflow/wrappers"
stagein:
micromamba install -y -c conda-forge rsync
ln -s /usr/bin/micromamba /usr/bin/mamba
ln -s /usr/bin/micromamba /usr/bin/conda
args: "--use-conda --conda-frontend mamba --show-failed-logs -j 10 --retries 1 --rerun-incomplete --conda-cleanup-pkgs cache --wrapper-prefix file:///github/workspace/workflow/wrappers"
stagein:
micromamba install -y -c conda-forge rsync;
ln -s /usr/bin/micromamba /usr/bin/mamba;
ln -s /usr/bin/micromamba /usr/bin/conda;
- name: Test report
uses: snakemake/[email protected]
with:
directory: .test
snakefile: workflow/Snakefile
args: "--report report.zip --wrapper-prefix file:///github/workspace/workflow/wrappers"
stagein:
micromamba install -y -c conda-forge rsync
ln -s /usr/bin/micromamba /usr/bin/mamba
ln -s /usr/bin/micromamba /usr/bin/conda
stagein:
micromamba install -y -c conda-forge rsync;
ln -s /usr/bin/micromamba /usr/bin/mamba;
ln -s /usr/bin/micromamba /usr/bin/conda;
50 changes: 46 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,42 @@ repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-merge-conflict
- id: check-added-large-files
- id: debug-statements
- id: mixed-line-ending
- id: check-case-conflict
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- id: end-of-file-fixer
- repo: https://github.com/asottile/reorder_python_imports
rev: v3.12.0
hooks:
- id: reorder-python-imports
args: [--application-directories=python, --unclassifiable-application-module=_tskit]
- repo: https://github.com/asottile/pyupgrade
rev: v3.15.0
hooks:
- id: pyupgrade
args: [--py3-plus, --py37-plus]
- repo: https://github.com/psf/black
rev: 23.12.1
rev: 23.3.0
hooks:
- id: black
language_version: python3
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: "v0.1.11"
hooks:
- id: ruff
args:
[ "--per-file-ignores=tests/test_utils.py:E501,manticore/tests/wm/snakemake.py:E501",
]
- repo: https://github.com/asottile/blacken-docs
rev: 1.16.0
hooks:
- id: blacken-docs
additional_dependencies: [black==22.12.0]
language_version: python3
- repo: https://github.com/snakemake/snakefmt
rev: v0.8.5
hooks:
Expand All @@ -17,6 +46,19 @@ repos:
hooks:
- id: lint
name: Snakemake lint
entry: snakemake --lint text
entry: snakemake --configfile config/config.yaml --configfile config/envmodules.yaml --lint text -v -s
language: system
files: ''
log_file: 'snakemake-lint.txt'
files: '.*\.smk$|Snakefile'
exclude: .*/test-.*\.smk$
- repo: https://github.com/DavidAnson/markdownlint-cli2
rev: v0.11.0
hooks:
- id: markdownlint-cli2
files: \.(md|qmd)$
types: [file]
exclude: LICENSE.md
- id: markdownlint-cli2-fix
files: \.(md|qmd)$
types: [file]
exclude: LICENSE.md
49 changes: 28 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Snakemake workflow: Assembly evaluation workflow

[![Snakemake](https://img.shields.io/badge/snakemake-≥5.25.0-brightgreen.svg)](https://snakemake.bitbucket.io)
[![Build status](https://github.com/NBISweden/assemblyeval-smk/workflows/Tests/badge.svg?branch=main)](https://github.com/NBISweden/assemblyeval-smk/actions?query=workflow%3ATests) ![License](https://img.shields.io/badge/license-MIT-blue.svg)
[![Build
status](https://github.com/NBISweden/assemblyeval-smk/workflows/Tests/badge.svg?branch=main)](https://github.com/NBISweden/assemblyeval-smk/actions?query=workflow%3ATests)
![License](https://img.shields.io/badge/license-MIT-blue.svg)

Snakemake workflow for genome assembly evaluation.

Expand All @@ -15,11 +17,14 @@ available, its DOI (see above; currently N/A).

## Features

- align transcripts to a reference sequence with [gmap](http://research-pub.gene.com/gmap/)
- estimate gene body coverage with [genecovr](https://github.com/NBISweden/genecovr)
- run [quast](http://bioinf.spbau.ru/quast), [jellyfish](http://www.genome.umd.edu/jellyfish.html), [busco](https://busco.ezlab.org), and [kraken2](https://ccb.jhu.edu/software/kraken2/)
- summarize quality metrics with [MultiQC](https://multiqc.info)
- (WIP): run some of the steps for adding reads, coverage files, and
* align transcripts to a reference sequence with [gmap](http://research-pub.gene.com/gmap/)
* estimate gene body coverage with [genecovr](https://github.com/NBISweden/genecovr)
* run [quast](http://bioinf.spbau.ru/quast),
[jellyfish](http://www.genome.umd.edu/jellyfish.html),
[busco](https://busco.ezlab.org), and
[kraken2](https://ccb.jhu.edu/software/kraken2/)
* summarize quality metrics with [MultiQC](https://multiqc.info)
* (WIP): run some of the steps for adding reads, coverage files, and
sequences to the [blobtoolkit](https://blobtoolkit.genomehubs.org) viewer

## Quickstart
Expand Down Expand Up @@ -50,14 +55,12 @@ details) . In addition, edit the following files:
`reads.tsv`
Raw sequence read files.


NOTE: the config directory doesn't have to be in the workflow source
directory, in which case snakemake must be invoked with the full path
to the Snakemake file:

snakemake -s /path/to/manticore-smk/workflow/Snakefile


### Step 3: Install Snakemake

Install Snakemake using
Expand Down Expand Up @@ -117,7 +120,8 @@ The report contains documentation and results from the workflow.

### Step 6: Contribute back

In case you have also changed or added steps, please consider contributing them back to the original repository:
In case you have also changed or added steps, please consider
contributing them back to the original repository:

1. [Fork](https://help.github.com/en/articles/fork-a-repo) the
original repo to a personal or lab account.
Expand All @@ -140,7 +144,6 @@ For a quick overview of example configuration files, see
and the test configuration
[.test/config/config.yaml](https://github.com/NBISweden/assemblyeval-smk/blob/main/.test/config/config.yaml)


### Schemas

All configuration files are evaluated against [configuration
Expand Down Expand Up @@ -177,10 +180,14 @@ examples.
The tabular input files can also be represented as lists in yaml
format. For instance, the following tabular data

<!-- markdownlint-disable MD010 -->

species version fasta
foo v1 resources/assembly.v1.fasta
foo v2 resources/assembly.v2.fasta

<!-- markdownlint-enable MD010 -->

would be represented as follows in yaml format

- species: foo
Expand Down Expand Up @@ -240,7 +247,6 @@ Quast will calculate quality metrics of an assembly.
quast:
ids: ["foo_v2", "foo_v1"]


#### kraken2

kraken2 assigns taxonomic ids to sequences and is used for
Expand All @@ -254,21 +260,19 @@ taxids over windows.
window_size: 20000
npartitions: 50


#### MultiQC

MultiQC doesn't have a configuration section per se. Instead, it will
collect and plot results for the following applications:

- busco
- jellyfish
- quast
- kraken2
* busco
* jellyfish
* quast
* kraken2

Plot configurations can be tweaked in a multiqc configuration file
`multiqc_config.yaml`.


### Rule configuration

Every rule has a corresponding configuration entry, with keywords for
Expand All @@ -281,17 +285,20 @@ instance, to change `options` for `jellyfish_count_chunk`, add
jellyfish_count_chunk:
options: --size 20G --canonical



## Testing

Test cases are in the subfolder `.test`. They are automatically
executed via continuous integration with [Github
Actions](https://github.com/features/actions). To run the tests, cd to
`.test` and issue

snakemake --use-conda --conda-frontend mamba --show-failed-logs --cores 2 --conda-cleanup-pkgs cache -s ../workflow/Snakefile --wrapper-prefix file://$(pwd)/../workflow/wrappers
snakemake --use-conda --conda-frontend mamba \
--show-failed-logs --cores 2 --conda-cleanup-pkgs cache \
-s ../workflow/Snakefile \
--wrapper-prefix file://$(pwd)/../workflow/wrappers

Once the test run has finished, create a report and view it:

snakemake --cores 1 -s ../workflow/Snakefile --wrapper-prefix file://$(pwd)/../workflow/wrappers --report report.html
snakemake --cores 1 -s ../workflow/Snakefile \
--wrapper-prefix file://$(pwd)/../workflow/wrappers \
--report report.html
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import itertools

import pandas as pd
import snakemake

input = snakemake.input
output = snakemake.output
Expand Down
3 changes: 1 addition & 2 deletions workflow/scripts/assemblyeval_kraken2_gather_reports.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# Read and merge kraken2 results output
#
from functools import reduce
import pandas as pd
import snakemake


def load(fn):
Expand Down
3 changes: 1 addition & 2 deletions workflow/scripts/assemblyeval_kraken2_python_make_windows.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import numpy as np
import snakemake
from pybedtools import BedTool


Expand Down
25 changes: 9 additions & 16 deletions workflow/scripts/assemblyeval_pybedtools_make_chunks.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import re
import pybedtools
from tqdm import tqdm
from snakemake.shell import shell
import snakemake
from snakemake.utils import logger
from tqdm import tqdm

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

Expand All @@ -15,8 +12,8 @@

regions = []
with open(faidx) as fh:
for l in tqdm(fh.readlines()):
fields = l.split("\t")
for line in tqdm(fh.readlines()):
fields = line.split("\t")
i = pybedtools.Interval(fields[0], 0, int(fields[1]))
regions.append(i)

Expand All @@ -25,22 +22,18 @@

try:
assert len(bed) >= npart
except AssertionError as e:
except AssertionError:
logger.warning(
(
f"Number of regions smaller than number of partitions: '{len(bed)} < {npart}': "
f"lower the number of partitions "
)
f"Number of regions smaller than number of partitions: '{len(bed)} < {npart}': "
f"lower the number of partitions "
)
raise

try:
assert partition < npart
except AssertionError as e:
except AssertionError:
logger.error(
(
f"partition number {partition} larger than the maximum number of partitions {npart}"
)
f"partition number {partition} larger than the maximum number of partitions {npart}"
)
raise

Expand Down
3 changes: 1 addition & 2 deletions workflow/scripts/assemblyeval_save_config.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import snakemake
import yaml
from collections import OrderedDict

inconfig = snakemake.config

Expand Down
5 changes: 3 additions & 2 deletions workflow/wrappers/bio/gmap/map/wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@

import os
import re
import snakemake
from snakemake.shell import shell

# Check genome size
gmap = "gmap"
with open(snakemake.input.log, "r") as fh:
m = re.search("Total genomic length = (\d+) bp", "\n".join(fh.readlines()))
with open(snakemake.input.log) as fh:
m = re.search(r"Total genomic length = (\d+) bp", "\n".join(fh.readlines()))
try:
if int(m.group(1)) > 2**32:
gmap = "gmapl"
Expand Down
4 changes: 1 addition & 3 deletions workflow/wrappers/bio/jellyfish/count/wrapper.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
__author__ = "Per Unneberg"
__copyright__ = "Copyright 2020, Per Unneberg"
__email__ = "[email protected]"
__license__ = "MIT"

import os
import re
import gzip
import snakemake
from snakemake.shell import shell
from snakemake.utils import logger

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

Expand Down
5 changes: 1 addition & 4 deletions workflow/wrappers/bio/jellyfish/histo/wrapper.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,12 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
__author__ = "Per Unneberg"
__copyright__ = "Copyright 2020, Per Unneberg"
__email__ = "[email protected]"
__license__ = "MIT"

import os
import re
import gzip
import snakemake
from snakemake.shell import shell
from snakemake.utils import logger

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

Expand Down
Loading