Visualization of different minimizer schemes supported in Digest and code example using library
- a
C++
library that supports various sub-sampling schemes for$k$ -mers in DNA sequences.-
Digest
library utilizes the rolling hash-function from ntHash to order the$k$ -mers in a window.
-

After cloning from GitHub, we use the Meson build-system to install the library.
PREFIX
is an absolute path to library files will be install (*.h
and*.a
files)- IMPORTANT:
PREFIX
should not be the root directory of theDigest/
repo to avoid any issues with installation.
- IMPORTANT:
- These commands generate an
include
andlib
folders inPREFIX
folder
git clone https://github.com/VeryAmazed/digest.git
meson setup --prefix=<PREFIX> --buildtype=release build
meson install -C build
If your coding project uses Meson
to build the executable(s), you can include a file called subprojects/digest.wrap
in your repository and let Meson install it for you.
To use Digest in your C++ project, you just need to include the header files (*.h
) and library file (*.a
) that were installed in the first step. Assuming that install/
is the directory you installed them in, here is how you can compile.
g++ -std=c++17 -o main main.cpp -I install/include/ -L install/lib -lnthash
There are three types of minimizer schemes that can be used:
- Windowed Minimizer
- Modimizer
- Syncmer
The general steps to use Digest is as follows: (1) include the relevant header files, (2) declare the Digest object and (3) find the positions where the minimizers are present in the sequence.
#include "digest/digester.hpp"
#include "digest/window_minimizer.hpp"
digest::WindowMin<digest::BadCharPolicy::WRITEOVER, digest::ds::Adaptive> digester (dna, 15, 7);
std::vector<size_t> output;
digester.roll_minimizer(100, output);
- This code snippet will find up to 100 Windowed Minimizers and store their positions in the vector called
output
. -
digest::BadCharPolicy::WRITEOVER
means that anytime the code encounters an non-ACTG
character, it will replace it with anA
.-
digest::BadCharPolicy::SKIPOVER
will skip any$k$ -mers with non-ACTG
characters
-
-
digest::ds::Adaptive
is our recommended data-structure for finding the minimum value in a window (see wiki for other options)
If you would like to obtain both the positions and hash values for each minimizer, you can pass a vector of paired integers to do so.
std::vector<std::pair<size_t, size_t>> output;
digester.roll_minimizer(100, output);