-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trailing comma on P lines after pruning with odgi prune
#549
Comments
The pruning is indeed the problem. @AndreaGuarracino Any ideas how this can happen? Some empty steps are left over? |
I just wanted to add that the pruning is also incomplete. When I try to prune this graph to remove nodes with 0 path coverage, with the commands mentioned before, the resulting graph still has nodes with no path going through them. Why would I have been busy lately, so I did not open another issue. But I can do it and explain it in more detail if you want, |
I've never used |
@AndreaGuarracino, honestly I have as much clue on how to cut it appropriately as you do... I guess a way of doing it would be to compute the path coverages, take the list of nodes with cov = 0, and randomly pick one and extract a big enough subgraph around such a node, and then Do you imagine other approaches? |
I'm lazier than you xD I thought you could divide the graph in two (half of the paths in graph 1 and half of the paths in graph 2), take the part that still triggers the problem, and divide it in two, ... until you have a fairly small graph. Or your approach. |
Dear odgi team,
While working on #548 (see that issue for current context), I could not load my graph with my tools after pruning. After digging a bit, I discovered that some of the pruned graph's P lines have trailing
,
s, which goes against convention. To be clear, what I mean is something like this:You can check this yourself by taking the P lines of the graph I shared with @subwaystation and running the following sed command:
If you check the contents of
path_issues_og.txt
, you will find the following:The third column should have the last node of the path, but you can see that the affected paths do not because those had the trailing comma and thus also matched the
sed
expression.I thought this was due to pruning the last nodes of a path from the graph, which then had to be removed from the P lines, and the function in
odgi sort
that does it is not removing the trailing,
. However, in this case, I removed nodes usingodgi prune -TEc 1
, which should remove only nodes having no paths crossing them. Hence, the P lines should not have been touched byodgi prune
in the first place.This may need more digging, but I would love to hear your thoughts.
Cheers,
The text was updated successfully, but these errors were encountered: