Ilias Giechaskiel, George Panagopoulos, Eiko Yoneki, 44th International Conference on Parallel Processing (ICPP), Beijing, China, Sep. 2015. DOI: 10.1109/ICPP.2015.46.
This repository contains the following:
- Ilias Giechaskiel's MPhil Dissertation.
- The University of Cambridge Computer Laboratory Technical Report.
- The ICPP Pre-Print.
- The ICPP Presentation.
All the code is under the code/src directory, with executables under code/bin.
Everything can be compiled by just executing make
from the code directory. This will create object files under code/obj, with the executables in code/bin.
If you want to alter the source code, at a high-level, the files are as follows:
mgt.[h/cpp]
implements the MGT algorithm with our modifications.localmgt.cpp
contains a main to run MGT locally, whilepdtlclient.cpp
andpdtlmaster.cpp
implement our distributed PDTL framework.highdegreehandler.[h/cpp]
implements the algorithm for the case when there are high-degree vertices, andinmem.cpp
implements one of the simple in-memory algorithms.fileparser.[h/cpp]
andfileconverter.[h/cpp]
implement various parsing and conversion functions, with the main inparser.cpp
.- Everything else is used to make the code more modular.
The code/bin directory contains all the executable binaries.
With the exception of some of the parser utilities, all binaries require the input in the following format: the graph must be undirected, and is split in 2 files, a degree file (ending in .deg
), and an adjacency file (ending in .adj
). The degree file contains the degrees of the vertices as v, d(v)
(with v
ranging from 0
to |V|
), while the adjacency file just stores the neighbors of each vertex in order (and sorted).
This format was chosen for compatibility with the original MGT implementation, which is explained in the manual. The format is also explained further in Section V-B of our paper, and a real example is given in code/graphs.
How to execute the various binaries is discussed below. The inputs and outputs always refer to the base name of the .deg/.adj
filenames.
Use this for an in-memory triangle listing algorithm. Simply execute inmem.bin input output ordered
, where input
is the base input name, output
is a 0 for counting, and non-0 for listing, and ordered
is non-0 if the adjacency list is already ordered.
Use this to remove high-degree vertices before orientation. Execute highdegreehandler.bin input output maxdeg report
, where input
is the (base) input name, output
is the (base) output name for the low-degree graph, maxdeg
is the maximum degree to allow in the new graph, and report
is non-0 for listing (instead of counting).
Use this for our version of the MGT algorithm. Execute mgt.bin filename maxdeg output mem instances
, where filename
is the (base) input name, output
is non-0 for listing, mem
is the maximum memory (in MB) to allocate per thread, and instances
is the number of threads to use.
maxdeg
is 0 if orientation has not yet been performed, while it is non-zero when the file is already oriented, and has a maximum out-degree equal to maxdeg
.
Use these two binaries to execute the distributed version of our algorithms. Run pdtlclient.bin port delete
on the remote machines, where port
refers to the port number to listen for incoming connections and delete
is non-zero to delete the files after the counting/listing for the particular graph has finished. This needs to be manually shutdown, e.g. via Ctrl-C.
pdtlmaster.bin filename maxdeg memsize instances output ip port mem instances ...
is used to run the master. filename
is the name of graph, maxdeg
is as for mgt.bin
(0 for orientation, non-zero for already oriented), memsize
and instances
is the memory (in MB) per thread and number of threads to allocate to the master, and output
is once again non-0 for listing.
For each client, add the following four arguments: ip
and port
for the IPv4 address and port of the client, instances
for the number of threads, and mem
the memory (in MB) per thread.
parser.bin
contains all the utilities for converting between different file formats. input
and output
always refer to the basenames of the input and output graphs.
Run parser.bin order input output
to order the neighbors of all vertices. This function requires memory proportional to the maximum degree.
Run parser.bin undirect input output
to convert a directed graph into an undirected graph. This function requires memory proportional to the total number of edges.
Run parser.bin orient input output [mem] [numthreads]
to orient the given graph. Optionally, add a mem
parameter to specify the amount of memory to allocate (in MB), per thread (0 for unlimited), and numthreads
to specify the number of threads.
Run parser.bin convert input output opt/xstream
to convert the graph from the PDTL format to either opt
or xstream
format.
Run parser.bin parse input output snap/xstream [mem] [vn]
to convert a graph from either the snap
or the xstream
format into the format required by PDTL. mem
optionally specifies the maximum amount of memory to allocate (0 for unlimited), and in the case of xstream
, vn
is equal to either 2 or 3, to indicate the type of X-Stream edges used.
The repository also contains an example of a real graph in the format required by PDTL. This is the Berkley-Stanford web graph from the SNAP Repository. Note that orientation has not yet been performed on this graph.