Updated reference file

Test data to be used for automated testing with the nf-core pipelines

Introduction

nf-core is a collection of high quality Nextflow pipelines.

Documentation

nf-core/test-datasets comes with documentation in the docs/ directory:

Data generation

STARsolo / AlevinQC Testdata

Please ask Olga Botvinnik for details on how this data was generated and subsetted.

Kallisto/Bustools Testdata

The [reference/kallisto] and [testdata/kallisto] folders hold testing data that was subsetted to be able to utilize the data on automated continous integration services due to memory and time restrictions on these services. The data used here refers to this howto article. The files were subsetted utilizing these commands:

zcat SRR8599150_S1_L001_R1_001.fastq.gz |head -n 5000 > SRR8599150_S1_L001_R1_001.sub5000.fastq
zcat SRR8599150_S1_L001_R2_001.fastq.gz |head -n 5000 > SRR8599150_S1_L001_R2_001.sub5000.fastq
zcat Mus_musculus.GRCm38.cdna.all.fa.gz | sed -n '433032,517910 p'
zcat Mus_musculus.GRCm38.96.gtf.gz | grep -e '^#' -e '^19' > Mus_musculus.GRCm38.96.chr19.gtf

## New reference files for kb wrapper (requires genomic fasta)
## kb can handle gzipped files and we therefore use gzipped references to keep them small
zgrep "chr19" gencode.vM26.annotation.gtf.gz | gzip > chr19.gtf.gz
zcat gencode.VM26.chr19.gtf.gz | head -10000 | gzip > gencode.VM26.chr19_10k.gtf.gz ## The gtf only contains a part of chr19 to keep it small
zcat chr19.fa.gz | head -100000 | gzip > chr19_100k.fa.gz ## The fasta only contains sequences for the genes defined in the gtf to keep it small

The GTF file contains annotation for more than just the chr19 data but has large portions of exons on chr19, so gives somewhat meaningful results. The cdna file was evaluated manually to determine an appropriate range with chr19 entries for testing.

Updated reference file

Assembly and annotation of GRCm39 were downloaded from gencode

samtools faidx GRCm39.genome.fa chr19 > GRCm39.genome.chr19.fa
grep chr19 "gencode.vM27.annotation.gtf"

Support

For further information or help, don't hesitate to get in touch on our Slack or Click here for an invite

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
docs		docs
new		new
reference		reference
results		results
testdata		testdata
LICENSE		LICENSE
README.md		README.md
samplesheet.csv		samplesheet.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Documentation

Data generation

STARsolo / AlevinQC Testdata

Kallisto/Bustools Testdata

Updated reference file

Support

About

Releases

Packages

Contributors 8

License

LarsAC/scrnaseq-testdata

Folders and files

Latest commit

History

Repository files navigation

Introduction

Documentation

Data generation

STARsolo / AlevinQC Testdata

Kallisto/Bustools Testdata

Updated reference file

Support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Packages