Pangenome graphs built from raw sets of alignments may have complex local structures generated by common patterns of genome variation. These local nonlinearities can introduce difficulty in downstream analyses, visualization, and interpretation of variation graphs.
smoothxg
finds blocks of paths that are collinear within a variation graph.
It applies partial order alignment to each block, yielding an acyclic variation graph.
Then, to yield a "smoothed" graph, it walks the original paths to lace these subgraphs together.
The resulting graph only contains cyclic or inverting structures larger than the chosen block size, and is otherwise manifold linear.
In addition to providing a linear structure to the graph, smoothxg
can be used to extract the consensus pangenome graph by applying the heaviest bundle algorithm to each chain.
To find blocks, smoothxg
applies a greedy algorithm that assumes that the graph nodes are sorted according to their occurence in the graph's embedded paths.
The path-guided stochastic gradient descent based 1D sort implemented in odgi sort -Y
is designed to provide this kind of sort.
smoothxg
is built with cmake:
cmake -H. -Bbuild && cmake --build build -- -j4