You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could you provide efficient query across the annotations using a FM-index over the concatenated annotation strings from the VCF file? A second compressed bitvector could encode variant annotation starts in this record (basically storing a variant to annotation mapping).
Then you could subset to a given set of records with a particular annotation by finding the ranks of the occurrences of a given pattern in the auxiliary bitvector.
I guess this wouldn't help much when you have to compare floats in the annotations and the annotation is included in all records. Then you end up needing to compare lots of values to execute the query. There might also be a way around this though.
The text was updated successfully, but these errors were encountered:
BGT has a different design from VCF. I see annotating each VCF is a waste of resource, so I encourage to use a single variant annotation file for all BGT databases. You locate a particular row in BGT by an allele string like "11:10000:1:C". Currently, BGT reads through the variant annotation file to collect allele strings and then find rows in BGT. It is reasonably fast. The preferred way is really to have a proper disk-based database backend for annotations. SQLite could be an option. Cassandra would be better if performance becomes an issue.
Could you provide efficient query across the annotations using a FM-index over the concatenated annotation strings from the VCF file? A second compressed bitvector could encode variant annotation starts in this record (basically storing a variant to annotation mapping).
Then you could subset to a given set of records with a particular annotation by finding the ranks of the occurrences of a given pattern in the auxiliary bitvector.
I guess this wouldn't help much when you have to compare floats in the annotations and the annotation is included in all records. Then you end up needing to compare lots of values to execute the query. There might also be a way around this though.
The text was updated successfully, but these errors were encountered: