- Ubuntu
- C++ 11
- GCC 4.8
- Intel C++ Compiler
- Boost (optional, only link prediction requires this)
Place the prepared data [file].txt in the NR_Dataset. Note that the first row of data is the node size and each row is the information of each edge [outNode] [inNode].
Besides, directed graph and undirected graph should be distinguished.
Datasets used in our paper are provided in NR_Dataset.
Data Set | Directed | N | M |
---|---|---|---|
BlogCatalog | No | 10312 | 333983 |
Flickr | No | 80513 | 5899882 |
YouTube | No | 1138499 | 2990443 |
WikiVote | Yes | 7115 | 103689 |
Slashdot | Yes | 82168 | 870161 |
Euro | No | 399 | 5993 |
Brazil | No | 131 | 1003 |
bash compile.sh
Move the files in the frPCA folder to the root directory before compiling.
We provide two versions of the code to ensure reproducibility.
We write a SVD version based on Eigen 3.x.
./STRAP_SVD_U <graph_name> <data_path> <emb_path> <alpha> <iteration> <error> <threads>
Parameters
- graph_name: name of target graph
- data_path: path to load source file
- emb_path: path to save embedding files
- alpha: parameter for PPR
- iteration: parameter for SVD
- error: parameter for Backward Push
- threads
Examples
For undirected graph:
./STRAP_SVD_U BlogCatalog-u NR_Dataset/ NR_EB/ 0.5 12 0.00001 24
For directed graph:
./STRAP_SVD_D wikivote NR_Dataset/ NR_EB/ 0.5 12 0.00001 24
Results in our paper are all based on this version.
In this version we make use of frPCA to get better performance.
./STRAP_FRPCA_U <graph_name> <data_path> <emb_path> <alpha> <iteration> <error> <threads>
Parameters
-
iteration: parameter for frPCA
-
others are the same as above
Examples
For undirected graph:
./STRAP_FRPCA_U BlogCatalog-u NR_Dataset/ NR_EB/ 0.5 12 0.00001 24
For directed graph:
./STRAP_FRPCA_D wikivote NR_Dataset/ NR_EB/ 0.5 12 0.00001 24
Train the embeddings of a full graph and then reconstruct it. The code to calculate reconstruction precision is provided:
./NET_RE_U BlogCatalog-u strap_frpca_u
./NET_RE_D wikivote strap_frpca_d
For big graphs, like YouTube, we sample a subgraph to do reconstruction.
First, split the graph into training/testing set and generate negative samples. Datasets will be saved into LP_Dataset separately. The ratio of testing part can be assigned:
./GEN_DATA_U BlogCatalog-u 0.5
./GEN_DATA_D wikivote 0.5
Then get embeddings of the training set. Predict missing edges via score
./LINK_PRE_U BlogCatalog-u strap_frpca_u
./LINK_PRE_D wikivote strap_frpca_d
Generate a classifier using the embeddings of full graph, the provided labels and the training set. The performance is evaluated in terms of average Micro-F1 and average Macro-F1. This part is implemented in Python 3.4 and sklearn 0.20.1:
python labelclassification.py BlogCatalog-u strap_frpca_u
Please cite our paper if you choose to use our code.
@inproceedings{10.1145/3292500.3330860,
author = {Yin, Yuan and Wei, Zhewei},
title = {Scalable Graph Embeddings via Sparse Transpose Proximities},
year = {2019},
isbn = {9781450362016},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3292500.3330860},
doi = {10.1145/3292500.3330860},
booktitle = {Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining},
pages = {1429–1437},
numpages = {9},
keywords = {network representation learning, personalized pagerank, graph embedding},
location = {Anchorage, AK, USA},
series = {KDD ’19}
}