SPARQL Anything is a system for Semantic Web re-engineering that allows users to ... query anything with SPARQL.
SPARQL Anything uses a single generic abstraction for all data source formats called Facade-X. Facade-X is a simplistic meta-model used by sparql.anything transformers to generate RDF data from diverse data sources. Intuitively, Facade-X uses a subset of RDF as a general approach to represent the source content as-it-is but in RDF. The model combines two type of elements: containers and literals. Facade-X has always a single root container. Container members are a combination of key-value pairs, where keys are either RDF properties or container membership properties. Instead, values can be either RDF literals or other containers. This is a generic example of a Facade-X data object (more examples below):
@prefix fx: <http://sparql.xyz/facade-x/ns/> .
@prefix xyz: <http://sparql.xyz/facade-x/data/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
[] a fx:Root ; rdf:_1 [
xyz:someKey "some value" ;
rdf:_1 "another value with unspecified key" ;
rdf:_2 [
rdf:type xyz:MyType ;
rdf:_1 "another value"
]
] .
SPARQL Anything extends the Apache Jena ARQ processors by overloading the SERVICE operator, as in the following example:
Suppose having this JSON file as input (also available at https://raw.githubusercontent.com/SPARQL-Anything/sparql.anything/main/examples/example1.json
)
[
{
"name":"Friends",
"genres":[
"Comedy",
"Romance"
],
"language":"English",
"status":"Ended",
"premiered":"1994-09-22",
"summary":"Follows the personal and professional lives of six twenty to thirty-something-year-old friends living in Manhattan.",
"stars":[
"Jennifer Aniston",
"Courteney Cox",
"Lisa Kudrow",
"Matt LeBlanc",
"Matthew Perry",
"David Schwimmer"
]
},
{
"name":"Cougar Town",
"genres":[
"Comedy",
"Romance"
],
"language":"English",
"status":"Ended",
"premiered":"2009-09-23",
"summary":"Jules is a recently divorced mother who has to face the unkind realities of dating in a world obsessed with beauty and youth. As she becomes older, she starts discovering herself.",
"stars":[
"Courteney Cox",
"David Arquette",
"Bill Lawrence",
"Linda Videtti Figueiredo",
"Blake McCormick"
]
}
]
With SPARQL Anything you can select the TV series starring "Courteney Cox" with the SPARQL query
PREFIX xyz: <http://sparql.xyz/facade-x/data/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?seriesName
WHERE {
SERVICE <x-sparql-anything:https://raw.githubusercontent.com/SPARQL-Anything/sparql.anything/main/examples/example1.json> {
?tvSeries xyz:name ?seriesName .
?tvSeries xyz:stars ?star .
?star ?li "Courteney Cox" .
}
}
and get this result without caring of transforming JSON to RDF.
seriesName |
---|
"Cougar Town" |
"Friends" |
Currently, the system supports the following formats: "json", "html", "xml", "csv", "bin", "png","jpeg","jpg","bmp","tiff","tif", "ico", "txt" ... but the possibilities are limitless!
By default, these formats are triplified as follows.
JSON
Input | Triplification |
---|---|
{ |
@prefix xyz: <http://sparql.xyz/facade-x/data/> . |
HTML
Input | Triplification |
---|---|
<html> |
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . |
XML
Input | Triplification |
---|---|
<breakfast_menu> |
@prefix xyz: <http://sparql.xyz/facade-x/data/> . |
CSV
Input | Triplification |
---|---|
[email protected],2070,Laura,Grey |
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . |
BIN, PNG, JPEG, JPG, BMP, TIFF, TIF, ICO
TXT
Input | Triplification |
---|---|
Hello World! |
[ <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> "Hello World!" ] . |
Metadata
Input | Triplification |
---|---|
<https://raw.githubusercontent.com/ianare/exif-samples/master/jpg/Canon_40D.jpg> |
sparql.anything will act as a virtual endpoint that can be queried exactly as a remote SPARQL endpoint. In order to instruct the query processor to delegate the execution to facade-x, you must use the following URI-schema within SERVICE clauses.
x-sparql-anything ':' ([option] ('=' [value])? ','?)+
A minimal URI that uses only the resource locator is also possible.
x-sparql-anything ':' URL
In this case sparql.anything guesses the data source type from the file extension.
Option name | Description | Valid Values | Default Value |
---|---|---|---|
location | The URL of the data source. | Any valid URL. | Mandatory |
root | The IRI of generated root resource. | Any valid IRI. | location + '#' |
media-type | The media-type of the data source. | Any valid Media-Type. Supported media-types: application/xml, image/png, text/html, application/octet-stream, application/json, image/jpeg, image/tiff, image/bmp, text/csv, image/vnd.microsoft.icon,text/plain | No value (the media-type will be guessed from the the file extension) |
namespace | The namespace prefix for the properties that will be generated. | Any valid namespace prefix. | http://sparql.xyz/facade-x/data/ |
blank-nodes | It tells sparql.anything to generate blank nodes or not. | true/false | true |
triplifier | It forces sparql.anything to use a specific triplifier for transforming the data source | A canonical name of a Java class | No value |
charset | The charset of the data source. | Any charset. | UTF-8 |
metadata | It tells sparql.anything to extract metadata from the data source and to store it in the named graph with URI <http://sparql.xyz/facade-x/data/metadata> | true/false | false |
HTML
Option name | Description | Valid Values | Default Value |
---|---|---|---|
html.selector | A CSS selector that restricts the HTML tags to consider for the triplification. | Any valid CSS selector. | No Value |
CSV
Option name | Description | Valid Values | Default Value |
---|---|---|---|
csv.format | The format of the input CSV file. | Any predefined CSVFormat of the Apache's commons CSV library | DEFAULT |
csv.headers | It tells the CSV triplifier to use the headers of the CSV file for minting the properties of the generated triples. | true/false | false |
BIN, PNG, JPEG, JPG, BMP, TIFF, TIF, ICO
Option name | Description | Valid Values | Default Value |
---|---|---|---|
bin.encoding | The encoding to use for generating the representation of the file. | BASE64 | BASE64 |
TXT
Option name | Description | Valid Values | Default Value |
---|---|---|---|
txt.regex | It tells sparql.anything to evaluate a regular expression on the data source. In this case the slots will be filled with the bindings of the regex. | Any valid regular expression | No value |
txt.group | It tells sparql.anything to generate slots by using a specific group of the regular expression. | Any integer | No value |
An executable JAR can be obtained from the Releases page.
The jar can be executed as follows:
usage: java -jar sparql.anything-<version> -q query [-f format] [-i
filepath] [-l path] [-o filepath]
-f,--format <string> OPTIONAL - Format of the output
file. Supported values: JSON, XML,
CSV, TEXT, TTL, NT, NQ. [Default:
TEXT or TTL]
-i,--input <input> OPTIONAL - The path to a SPARQL
result set file to be used as
input. When present, the query is
pre-processed by substituting
variable names with values from the
bindings provided. The query is
repeated for each set of bindings
in the input result set.
-l,--load <load> OPTIONAL - The path to one RDF file
or a folder including a set of
files to be loaded. When present,
the data is loaded in memory and
the query executed against it.
-o,--output <file> OPTIONAL - The path to the output
file. [Default: STDOUT]
-p,--output-pattern <outputPattern> OPTIONAL - Output filename pattern,
e.g. 'myfile-?friendName.json'.
Variables should start with '?' and
refer to bindings from the input
file. This option can only be used
in combination with 'input' and is
ignored otherwise. This option
overrides 'output'.
-q,--query <query> The path to the file storing the
query to execute or the query
itself.
Logging can be configured adding the following option (SLF4J):
-Dorg.slf4j.simpleLogger.defaultLogLevel=trace
We conducted a comparative evaluation of sparql.anything with respect to the state of art methods RML and SPARQL Generate.
SPARQL Anything is distributed under Apache 2.0 License