-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Feature] [Example] Graph matching routines with DGLGraphs (dmlc#1935)
* Adding graph matching routines * Adding graph matching routines Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Zihao Ye <[email protected]>
- Loading branch information
1 parent
4ef01db
commit 4097fa2
Showing
3 changed files
with
941 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Graph Matching Routines | ||
|
||
Implementation of various algorithms to compute the Graph Edit Distance (GED) between two DGLGraphs G1 and G2. The graph edit distance between two graphs is a generalization of the string edit distance between strings. The following four algorithms are implemented: | ||
|
||
- astar: Calculates exact GED using A* graph traversal algorithm, the heuristic used is the one proposed in (Riesen and Bunke, 2009) [1]. | ||
- beam: Calculates approximate GED using A* graph traversal algorithm, with a threshold on the size of the open list. [2] | ||
- bipartite: Calculates approximate GED using linear assignment on the nodes, with Jonker-Volgerand (JV) algorithm. [3] | ||
- hausdorff: Approximation of graph edit distance based on Hausdorff matching [4]. | ||
|
||
### Dependencies | ||
- lapjv (https://github.com/src-d/lapjv): We use the lapjv implementation to solve assignment problem, because of its scalability. Another option is to use the hungarian algorithm provided by scipy (scipy.optimize.linear_sum_assignment). | ||
|
||
### Usage | ||
|
||
Examples of usage are provided in examples.py. The function signature and an example is also given below: | ||
|
||
```sh | ||
graph_edit_distance(G1, G2, node_substitution_cost=None, edge_substitution_cost=None, G1_node_deletion_cost=None, G2_node_insertion_cost=None, G1_edge_deletion_cost=None, G2_edge_insertion_cost=None, algorithm='bipartite', max_beam_size=100) | ||
""" | ||
Parameters | ||
---------- | ||
G1, G2: DGLGraphs | ||
node_substitution_cost, edge_substitution_cost : 2D numpy arrays | ||
node_substitution_cost[i,j] is the cost of substitution node i of G1 with node j of G2, similar definition for edge_substitution_cost. If None, default cost of 0 is used. | ||
G1_node_deletion_cost, G1_edge_deletion_cost : 1D numpy arrays | ||
G1_node_deletion_cost[i] is the cost of deletion of node i of G1, similar definition for G1_edge_deletion_cost. If None, default cost of 1 is used. | ||
G2_node_insertion_cost, G2_edge_insertion_cost : 1D numpy arrays | ||
G2_node_insertion_cost[i] is the cost of insertion of node i of G2, similar definition for G2_edge_insertion_cost. If None, default cost of 1 is used. | ||
algorithm : string | ||
Algorithm to use to calculate the edit distance. Can be either 'astar', 'beam', 'bipartite' or 'hausdorff'. | ||
max_beam_size : int | ||
Maximum number of nodes in the open list, in case the algorithm is 'beam'. | ||
Returns | ||
------- | ||
A tuple of three objects: (edit_distance, node_mapping, edge_mapping) | ||
edit distance is the calculated edit distance (float). | ||
node_mapping is a tuple of size two, containing the node assignments of the two graphs respectively. eg., node_mapping[0][i] is the node mapping of node i of graph G1 (None means that the node is deleted). Similar definition for the edge_mapping. | ||
For 'hausdorff', node_mapping and edge_mapping are returned as None, as this approximation does not return a unique edit path. | ||
Examples | ||
-------- | ||
>>> src1 = [0, 1, 2, 3, 4, 5]; | ||
>>> dst1 = [1, 2, 3, 4, 5, 6]; | ||
>>> src2 = [0, 1, 3, 4, 5]; | ||
>>> dst2 = [1, 2, 4, 5, 6]; | ||
>>> G1 = dgl.DGLGraph((src1, dst1)) | ||
>>> G2 = dgl.DGLGraph((src2, dst2)) | ||
>>> distance, node_mapping, edge_mapping = graph_edit_distance(G1, G1, algorithm='astar') | ||
>>> print(distance) | ||
0.0 | ||
>>> distance, node_mapping, edge_mapping = graph_edit_distance(G1, G2, algorithm='astar') | ||
>>> print(distance) | ||
1.0 | ||
``` | ||
### References | ||
[1] Riesen, Kaspar, Stefan Fankhauser, and Horst Bunke. "Speeding Up Graph Edit Distance Computation with a Bipartite Heuristic." MLG. 2007. | ||
[2] Neuhaus, Michel, Kaspar Riesen, and Horst Bunke. "Fast suboptimal algorithms for the computation of graph edit distance." Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). 2006. | ||
[3] Fankhauser, Stefan, Kaspar Riesen, and Horst Bunke. "Speeding up graph edit distance computation through fast bipartite matching." International Workshop on Graph-Based Representations in Pattern Recognition. 2011. | ||
[4] Fischer, Andreas, et al. "A hausdorff heuristic for efficient computation of graph edit distance." Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). 2014. | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
from ged import graph_edit_distance | ||
import dgl | ||
import numpy as np | ||
|
||
|
||
src1 = [0, 1, 2, 3, 4, 5]; | ||
dst1 = [1, 2, 3, 4, 5, 6]; | ||
|
||
src2 = [0, 1, 3, 4, 5]; | ||
dst2 = [1, 2, 4, 5, 6]; | ||
|
||
|
||
G1 = dgl.DGLGraph((src1, dst1)) | ||
G2 = dgl.DGLGraph((src2, dst2)) | ||
|
||
|
||
# Exact edit distance with astar search | ||
distance, node_mapping, edge_mapping = graph_edit_distance(G1, G1, algorithm='astar') | ||
print(distance) # 0.0 | ||
distance, node_mapping, edge_mapping = graph_edit_distance(G1, G2, algorithm='astar') | ||
print(distance) # 1.0 | ||
|
||
# With user-input cost matrices | ||
node_substitution_cost = np.empty((G1.number_of_nodes(), G2.number_of_nodes())); | ||
G1_node_deletion_cost = np.empty(G1.number_of_nodes()); | ||
G2_node_insertion_cost = np.empty(G2.number_of_nodes()); | ||
|
||
edge_substitution_cost = np.empty((G1.number_of_edges(), G2.number_of_edges())); | ||
G1_edge_deletion_cost = np.empty(G1.number_of_edges()); | ||
G2_edge_insertion_cost = np.empty(G2.number_of_edges()); | ||
|
||
# Node substitution cost of 0 when node-ids are same, else 1 | ||
node_substitution_cost.fill(1.0); | ||
for i in range(G1.number_of_nodes()): | ||
for j in range(G2.number_of_nodes()): | ||
node_substitution_cost[i,j] = 0.0; | ||
|
||
# Node insertion/deletion cost of 1 | ||
G1_node_deletion_cost.fill(1.0); | ||
G2_node_insertion_cost.fill(1.0); | ||
|
||
# Edge substitution cost of 0 | ||
edge_substitution_cost.fill(0.0); | ||
|
||
# Edge insertion/deletion cost of 0.5 | ||
G1_edge_deletion_cost.fill(0.5); | ||
G2_edge_insertion_cost.fill(0.5); | ||
|
||
distance, node_mapping, edge_mapping = graph_edit_distance(G1, G2, \ | ||
node_substitution_cost, edge_substitution_cost, \ | ||
G1_node_deletion_cost, G2_node_insertion_cost, \ | ||
G1_edge_deletion_cost, G2_edge_insertion_cost, \ | ||
algorithm="astar") | ||
|
||
print(distance) #0.5 | ||
|
||
|
||
# Approximate edit distance with beam search, it is more than or equal to the exact edit distance | ||
distance, node_mapping, edge_mapping = graph_edit_distance(G1, G2, algorithm='beam', max_beam_size=2) | ||
print(distance) # 3.0 | ||
|
||
# Approximate edit distance with bipartite heuristic, it is more than or equal to the exact edit distance | ||
distance, node_mapping, edge_mapping = graph_edit_distance(G1, G2, algorithm='bipartite') | ||
print(distance) # 9.0, can be different as multiple solutions possible for the intermediate LAP used in this approximation | ||
|
||
|
||
# Approximate edit distance with hausdorff heuristic, it is less than or equal to the exact edit distance | ||
distance, node_mapping, edge_mapping = graph_edit_distance(G1, G2, algorithm='hausdorff') | ||
print(distance) # 0.0 |
Oops, something went wrong.