Binaries of ngs-bits are available via Bioconda. Alternatively, ngs-bits can be built from sources:
- Binaries for Linux/macOS
- From sources for Linux/macOS
- From sources for Windows
Changes already implemented in GIT master for next release:
- GSvar: improved cfDNA sample handling (Batchimport of cfDNA panels, queue anlysis, ...)
- VcfAnnotateFromBed: added multithread support
- MappingQC: added support for cfDNA samples
Changes in release 2021_06:
- General: Improved GRCh38 support in several tools.
- General: Using BGZIP for compressed VCFs now to allow indexing them with tabix.
- VcfAnnotateFromBed: Made separator configurable; Added check for separator in source BED file; Fixed broken output VCF if input has no FORMAT column.
- VcfAnnotateFromVcf: Fixed crash in VCF header parser.
- NGSDExportSamples: Added ancestry column.
- SampleAncestry: Improved runtime and memory use.
- SampleGender: Improved runtime for algorithm 'hetx'.
- SomaticQC: Added support for mutect2.
- NGSD:
- Added disease status 'Unclear' to table 'sample'.
- Added table 'processed_sample_ancestry'.
- Added percent occupied to 'runqc_lane' (for Illumina NovaSeq).
For older releases see the releases page.
Please report any issues or questions to the ngs-bits issue tracker.
Have a look at the ECCB'2018 poster.
The documentation of individual tools is linked in the tools list below.
For some tools the documentation pages contain only the command-line help, for other tools they contain more information.
ngs-bits is provided under the MIT license and is based on other open source software:
- htslib for HTS data format support (BAM, VCF, ...)
- SimpleCrypt for weak encryption
- QR-Code-generator for QR code generation
ngs-bits contains a lot of tools that are used for NGS-based diagnostics in our institute.
Some of the tools need the NGSD, a database that contains for example gene, transcript and exon data.
Installation instructions for the NGSD can be found here.
- SeqPurge - A highly-sensitive adapter trimmer for paired-end short-read data.
- SampleSimilarity - Calculates pairwise sample similarity metrics from VCF/BAM files.
- SampleGender - Determines sample gender based on a BAM file.
- SampleAncestry - Estimates the ancestry of a sample based on variants.
- CnvHunter - CNV detection from targeted resequencing data using non-matched control samples.
- RohHunter - ROH detection based on a variant list annotated with AF values.
- UpdHunter - UPD detection from trio variant data.
The default output format of the quality control tools is qcML, an XML-based format for -omics quality control, that consists of an XML schema, which defined the overall structure of the format, and an ontology which defines the QC metrics that can be used.
- ReadQC - Quality control tool for FASTQ files.
- MappingQC - Quality control tool for a BAM file.
- VariantQC - Quality control tool for a VCF file.
- SomaticQC - Quality control tool for tumor-normal pairs (paper and example output data).
- TrioMaternalContamination - Detects maternal contamination of a child using SNPs from parents.
- BamClipOverlap - (Soft-)Clips paired-end reads that overlap.
- BamDownsample - Downsamples a BAM file to the given percentage of reads.
- BamFilter - Filters a BAM file by multiple criteria.
- BamHighCoverage - Determines high-coverage regions in a BAM file.
- BamToFastq - Converts a BAM file to FASTQ files (paired-end only).
- BedAdd - Merges regions from several BED files.
- BedAnnotateFromBed - Annotates BED file regions with information from a second BED file.
- BedAnnotateGC - Annnotates the regions in a BED file with GC content.
- BedAnnotateGenes - Annotates BED file regions with gene names (needs NGSD).
- BedChunk - Splits regions in a BED file to chunks of a desired size.
- BedCoverage - Annotates the regions in a BED file with the average coverage in one or several BAM files.
- BedExtend - Extends the regions in a BED file by n bases.
- BedGeneOverlap - Calculates how much of each overlapping gene is covered (needs NGSD).
- BedHighCoverage - Detects high-coverage regions from a BAM file.
- BedInfo - Prints summary information about a BED file.
- BedIntersect - Intersects two BED files.
- BedLowCoverage - Calcualtes regions of low coverage based on a input BED and BAM file.
- BedMerge - Merges overlapping regions in a BED file.
- BedReadCount - Annoates the regions in a BED file with the read count from a BAM file.
- BedShrink - Shrinks the regions in a BED file by n bases.
- BedSort - Sorts the regions in a BED file
- BedSubtract - Subracts one BED file from another BED file.
- BedToFasta - Converts BED file to a FASTA file (based on the reference genome).
- FastqAddBarcode - Adds sequences from separate FASTQ as barcodes to read IDs.
- FastqConvert - Converts the quality scores from Illumina 1.5 offset to Sanger/Illumina 1.8 offset.
- FastqConcat - Concatinates several FASTQ files into one output FASTQ file.
- FastqDownsample - Downsamples paired-end FASTQ files.
- FastqExtract - Extracts reads from a FASTQ file according to an ID list.
- FastqExtractBarcode - Moves molecular barcodes of reads to a separate file.
- FastqExtractUMI - Moves unique moleculare identifier from read sequence to read ID.
- FastqFormat - Determines the quality score offset of a FASTQ file.
- FastqList - Lists read IDs and base counts.
- FastqMidParser - Counts the number of occurances of each MID/index/barcode in a FASTQ file.
- FastqToFasta - Converts FASTQ to FASTA format.
- FastqTrim - Trims start/end bases from the reads in a FASTQ file.
- VcfAnnotateFromBed - Annotates the INFO column of a VCF with data from a BED file.
- VcfAnnotateFromVcf - Annotates the INFO column of a VCF with data from another VCF file (or multiple VCF files if config file is provided)
- VcfBreakMulti - Breaks multi-allelic variants into several lines, making sure that allele-specific INFO/SAMPLE fields are still valid.
- VcfCheck - Checks a VCF file for errors.
- VcfExtractSamples - Extract one or several samples from a VCF file.
- VcfFilter - Filters a VCF based on the given criteria.
- VcfLeftNormalize - Normalizes all variants and shifts indels to the left in a VCF file.
- VcfSort - Sorts variant lists according to chromosomal position.
- VcfStreamSort - Sorts entries of a VCF file according to genomic position using a stream.
- VcfToBedpe - Converts a VCF file containing structural variants to BEDPE format.
- VcfToTsv - Converts a VCF file to a tab-separated text file.
- BedpeAnnotateFromBed - Annotates a BEDPE file with information from a BED file.
- BedpeFilter - Filters a BEDPE file by region.
- BedpeGeneAnnotation - Annotates a BEDPE file with gene information from the NGSD (needs NGSD).
- BedpeToBed - Converts a BEDPE file into BED file.
- NGSDAnnotateSV - Annotates the structural variants of a given BEDPE file by the NGSD counts (needs NGSD).
- SvFilterAnnotations - Filter a structural variant list in BEDPE format based on variant annotations.
- GenesToApproved - Replaces gene symbols by approved symbols using the HGNC database (needs NGSD).
- GenesToBed - Converts a text file with gene names to a BED file (needs NGSD).
- NGSDExportGenes - Lists genes from NGSD (needs NGSD).
- PhenotypesToGenes - Converts a phenotype list to a list of matching genes (needs NGSD).
- PhenotypeSubtree - Returns all sub-phenotype of a given phenotype (needs NGSD).