Name		Name	Last commit message	Last commit date
parent directory ..
slurm_files		slurm_files
README.md		README.md
calib_parameter_tests.sh		calib_parameter_tests.sh
calib_parameter_tests_plotting.py		calib_parameter_tests_plotting.py
calib_scalability_tests.sh		calib_scalability_tests.sh
calib_scalability_tests_plotting.py		calib_scalability_tests_plotting.py
real_tests.sh		real_tests.sh
real_tests_plotting.py		real_tests_plotting.py
simulated_tests.sh		simulated_tests.sh

README.md

Calib Experiments Scripts

We performed four experiments for Calib. The results of the experiments are here. This directory contains scripts for running those experiments on Slurm Workload Manager. However, the Slurm files generated are still BASH files and can be run using:

bash <slurm_file_name>

Note: All Slurm scripts must be run from Calib's root directory!

Simulated Datasets

To run the simulated dataset tests, you need first to clone the other tools by running:

cd <CALIB_ROOT_DIRECTORY>
git submodule update --init --recursive

Then run the Slurm generating and running script:

cd <CALIB_ROOT_DIRECTORY>
slurum_scripts/simulated_tests.sh <dataset_name>

There are three datasets tests reported in Calib's paper,

small: Has 100K molecules and 100 barcode tags
medium: Has 1M molecules and 5K barcode tags
large Has 1M molecules and 25K barcode tags

The results will be in TSV files off different tools. For small:

simulating/datasets/randS_42/barL_8.barNum_100/geneNum_35.refName_hg38.geneList_COSMIC_cancer_genes/ref_hg38.molMin_150.molMu_300.molDev_25.molNum100000/pcrC_7.pcrDR_0.6.pcrER_0.00005/seqMach_HS25.readL_150/*_benchmarks.tsv

For medium:

simulating/datasets/randS_42/barL_8.barNum_25000/geneNum_35.refName_hg38.geneList_COSMIC_cancer_genes/ref_hg38.molMin_150.molMu_300.molDev_25.molNum1000000/pcrC_7.pcrDR_0.6.pcrER_0.00005/seqMach_HS25.readL_150/*_benchmarks.tsv

For large:

simulating/datasets/randS_42/barL_8.barNum_5000/geneNum_35.refName_hg38.geneList_COSMIC_cancer_genes/ref_hg38.molMin_150.molMu_300.molDev_25.molNum1000000/pcrC_7.pcrDR_0.6.pcrER_0.00005/seqMach_HS25.readL_150/*_benchmarks.tsv

Note on running things without Slurm

If you wish to run the Slurm scripts on BASH, you need to run one Slurm file at a time. The Slurm files have dependency on one another, and things can break down if the dependency is not respected. This is true especially in case of simulating the datasets. If two scripts end up trying to generate the same reads at the same time, they will end up inevitably corrupting the simulation outputs.

Real dataset

To run the simulated dataset tests, you need first to clone the other tools by running:

cd <CALIB_ROOT_DIRECTORY>
git submodule update --init --recursive

Then run the Slurm generating and running script:

cd <CALIB_ROOT_DIRECTORY>
slurm_scripts/real_tests.sh <R1.fastq> <R2.fastq> <panel.hg19.bed> <output_directory>

<R1.fastq> and <R2.fastq> are the real dataset FASTQ files which can be downloaded from here. panel.hg19.bed is the panel of targeted regions used to pull down material for sequencing. Finally, output directory is where the results will be put of running the complete pipeline.

Notes

If you are not going to use Slurm, follow the same note mentioned in the simulated dataset runs above.

The real dataset testing pipeline assumes that samtools is in your $PATH. If samtools is installed somewhere else, please edit the corresponding variable in slurm_scripts/real_tests.sh.

Scalability of multithreading

Simply run:

cd <CALIB_ROOT_DIRECTORY>
slurm_scripts/calib_scalability_tests.sh

Feel free to edit the variables and for loop in the script to try different parameters. The same not applies regarding not running Slurm as above.

Parameter selection

Simply run:

cd <CALIB_ROOT_DIRECTORY>
slurm_scripts/calib_parameter_tests.sh

Feel free to edit the variables and for loop in the script to try different parameters. The same not applies regarding not running Slurm as above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slurm_scripts

slurm_scripts

README.md

Calib Experiments Scripts

Simulated Datasets

Note on running things without Slurm

Real dataset

Notes

Scalability of multithreading

Parameter selection

Files

slurm_scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

slurm_scripts

Folders and files

parent directory

README.md

Calib Experiments Scripts

Simulated Datasets

Note on running things without Slurm

Real dataset

Notes

Scalability of multithreading

Parameter selection