-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minigraph requires GFA 1.0 overlaps, which are optional in the spec #6
Comments
Duplicate of #1. Minigraph is more for mapping to a reference graph. I don't think a reference graph should allow overlaps. Also, it is tricky to work with overlaps. It will take time to implement the feature. Minigraph may support overlaps in future, but that won't happen soon unfortunately. |
I think these two issues are related but maybe not duplicates. The issue isn't the graph structure but its format in this case. I have a graph with no overlaps (CIGAR
This is valid GFA1 and my graph is a reference graph (i.e., constructed from a reference genome backbone with reference-relative variation added). minigraph just refused to parse it because it expects the sixth field (CIGAR) to be present. Edit: I agree though that reference graphs should not have overlaps. That would make everything a lot harder! |
Sorry for misreading your question. My initial intention was to require CIGAR because L-lines may have tags. Making CIGAR optional will complicate tags. In addition, |
Yes - I can file an issue on the spec to clarify whether "CIGAR optional" means no field present or CIGAR field == empty string. I have only ever had GFA files where, if tags are present, there is also a CIGAR. There's some room for clarification on that. I think a good fix would be to note in the README/docs:
I think that would sufficient to prevent any users from falling into traps, without having to change the source code. |
The GFA specs state that overlaps for
L
lines are optional, but minigraph seems to require these. I think this happens because there is no way to avoid parsing a CIGAR string in this code block.For GFA 2.0, the "*" placeholder is used to denote a lack of an overlap CIGAR. GFA 1.0 doesn't specify a placeholder, just that the field is optional. I have always assumed that a GFA with no CIGAR in the
L/E
lines implies a non-overlap (i.e., a CIGAR of0M
).Would it be possible to adjust the default condition so that the parser can handle all valid GFA 1 files?
The text was updated successfully, but these errors were encountered: