Skip to content
alexyyl edited this page Oct 27, 2019 · 6 revisions

Welcome to the FSV-Fragsifier wiki!

Fragsifier is an STR sequence extraction tool that uses sequence models to identify STR sequences.

Input file types

The Fragsifier algorithm performs STR extraction on individual lines from the input file, so any input file type that contains sequences/reads in rows are valid inputs for the algorithm. When given a FASTQ file Fragsifier will skip the header and quality lines.

Outputs

Fragsifier produces two output files, a file containing the extracted sequence from each line in the input file, and a file containing the cumulated read counts for each unique sequence.

Extractions file Each line in the extractions file reports the STR sequence extracted from the line/read. It is empty if no STRs were found. Each line contains results information separated by colons and informs the STR locus, the orientation of the sequence (forward, reverse), the extracted sequence, and the flanking sequence alignment score.

DYS481:F:CTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTCTT:25.0

Sequences file Each line in the sequences file reports the read counts for a unique STR sequence. Each line contains results information separated by colons and informs the STR locus, the extracted sequence, reads in forward orientation, reads in reverse orientation, total reads, and the allele name/number calculated from the sequence.

DYS389I,TCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTA,5780,0,5780,13

Custom model training

Current limitations

As Fragsifier use repeat stretches to identify STRs, other non-repeating markers in the input data (such as Amelogenin and SNPs) will not be detected.

Clone this wiki locally