The versatility of the Resource Description Framework (RDF) has allowed many web services to publish very large datasets that are impractical to process on a single computer. Therefore, many distributed SPARQL engines on shared-nothing computer clusters have emerged. Some utilize distributed frameworks such as MapReduce; others implement proprietary distributed processing; and some rely on expensive pre-processing for data partitioning. These systems exhibit a variety of trade-offs that are not well-understood, due to the lack of any comprehensive quantitative and qualitative evaluation. In this paper, we present a survey of 21 state-of-the-art systems that cover the entire spectrum of distributed RDF data processing, categorize them by several characteristics, and explain their similarities and differences. Then, we select 11 representative systems and perform extensive experimental evaluation with respect to pre-processing cost, query performance, scalability and workload adaptability, using a variety of synthetic and real large datasets with up to 4.2B triples. Our results provide valuable insights for practitioners to understand the trade-offs for their usage scenarios. Finally, we publish online our evaluation framework, including all datasets and workloads, for researchers to compare their novel systems against the existing ones.
Please see our technical report for details.
All queries used in our experimental evaluation exists in #queries# folder including the individual benchmark queries or the query workloads.
System | Download |
---|---|
AdPart | https://github.com/razen-alharbi/AdPart |
TriAD | Contact Author: mailto:[email protected] |
gStoreD | https://github.com/bnu05pp/gStoreD |
SHAPE | https://sites.google.com/site/gtshape/ |
DREAM | https://github.com/CMU-Q/DREAM |
H2RDF+ | https://github.com/zcourts/h2rdf/tree/master/H2RDF%2Bv0.2 |
S2RDF | http://dbis.informatik.uni-freiburg.de/forschung/projekte/DiPoS/S2RDF.html |
S2X | http://dbis.informatik.uni-freiburg.de/forschung/projekte/DiPoS/S2X.html |
CliqueSquare | https://team.inria.fr/oak/projects/cliquesquare/ |
SHARD | https://svn.code.sf.net/p/shard-3store/code/ |
H-RDF-3X | Contact Author: [email protected] |