Write a Python program called sampler.py
that will probabilistically sample one or more input FASTA files into an output directory.
The inputs for this program will be generated by your synth.py
program.
You can run make fasta
to create files of 1K, 10K, and 100K reads in this directory.
You can then use these files for testing your program:
$ ./sampler.py -m 2 tests/inputs/n1k.fa
1: n1k.fa
Wrote 2 sequences from 1 file to directory "out".
$ cat out/n1k.fa
>34
AACATCAGGTATGGTCATCAGTTTTAGGATTTGAAGTAATTCTTCGCGAATCTTCGATCT
CTATAGGATCAGGAATTATACTTAACTTTATACTATAAGTGAAATAAACTCACTATGAAA
TTGGTAGTGGAACAGCAGAAGTTCAGATGATTTATCAGAAAAGTAATAGTGAGTAATCCT
TTAGATTTA
>40
TAGATTGCATCAGGGATTCAGGGCTGACCTTGTTGCACAGCATAAACAACTGATACACAC
AGACTATCTACTATACCATAAACATCTTGCTACTACAATTTCAGGTTCCTATGGATTTAA
TTGGCGCTTTATTTATCTGA
Here is the usage your program should create for -h
or --help
:
$ ./sampler.py -h
usage: sampler.py [-h] [-f format] [-p reads] [-m max] [-s seed] [-o DIR]
FILE [FILE ...]
Probabalistically subset FASTA files
positional arguments:
FILE Input FASTA/Q file(s)
optional arguments:
-h, --help show this help message and exit
-f format, --format format
Input file format (default: fasta)
-p reads, --percent reads
Percent of reads (default: 0.1)
-m max, --max max Maximum number of reads (default: 0)
-s seed, --seed seed Random seed value (default: None)
-o DIR, --outdir DIR Output directory (default: out)
A passing test suite looks like this:
$ make test
python3 -m pytest -xv --disable-pytest-warnings --flake8 --pylint
--pylint-rcfile=../pylintrc --mypy sampler.py tests/sampler_test.py
============================= test session starts ==============================
...
collected 15 items
sampler.py::FLAKE8 PASSED [ 6%]
sampler.py::mypy PASSED [ 12%]
tests/sampler_test.py::FLAKE8 SKIPPED [ 18%]
tests/sampler_test.py::mypy PASSED [ 25%]
tests/sampler_test.py::test_exists PASSED [ 31%]
tests/sampler_test.py::test_usage PASSED [ 37%]
tests/sampler_test.py::test_bad_file PASSED [ 43%]
tests/sampler_test.py::test_bad_pct PASSED [ 50%]
tests/sampler_test.py::test_bad_seed PASSED [ 56%]
tests/sampler_test.py::test_bad_format PASSED [ 62%]
tests/sampler_test.py::test_defaults_one_file PASSED [ 68%]
tests/sampler_test.py::test_fastq_input PASSED [ 75%]
tests/sampler_test.py::test_defaults_multiple_file PASSED [ 81%]
tests/sampler_test.py::test_max_reads PASSED [ 87%]
tests/sampler_test.py::test_options PASSED [ 93%]
::mypy PASSED [100%]
===================================== mypy =====================================
Success: no issues found in 2 source files
======================== 15 passed, 1 skipped in 3.73s =========================
Ken Youens-Clark [email protected]