- Introduction
- Installation
- Debian Repo
- Change Log
- 2023-05-17 rdf2sparql.pl
- 2023-05-05 rdfpuml.pl
- 2023-04-29 rdfpuml.pl
- 2023-04-19 rdf2sparql.pl
- 2022-08-23 rdf2sparql.pl
- 2022-08-23 rdfpuml.pl
- 2022-08-15 rdf2sparql.pl
- 2022-04-08 rdf2ontorefine.pl
- 2021-09-02 Unicode Processing
- 2020-09-17 rdf2rml: logicalTable
- 2020-06-01 rdf2tarql.pl
- 2020-06-01 rdf2rml.sh, rdf2rml.ru
- 2020-05-30 rdf2rml: inverse edge
- 2018-11-14 Avoid puml:stereotype class node
- 2018-06-29 Bug: class and puml:InlineProperty
- 2018-04-05 Arrow Attributes
- To Do Tasks
See this presentation:
RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation Alexiev, V. In Semantic Web in Libraries 2016 (SWIB 16), Bonn, Germany, November 2016. Presentation, HTML, PDF, Video
RDF is a graph data model, so the best way to understand RDF data schemas (ontologies, application profiles, RDF shapes) is with a diagram. Many RDF visualization tools exist, but they either focus on large graphs (where the details are not easily visible), or the visualization results are not satisfactory, or manual tweaking of the diagrams is required.
We describe a tool rdfpuml that makes true diagrams directly from Turtle examples using PlantUML and GraphViz.
Diagram readability is of prime concern, and rdfpuml introduces various diagram control mechanisms using triples in the puml:
namespace.
Special attention is paid to inlining and visualizing various Reification mechanisms (described with PRV).
We give examples from Getty CONA, Getty Museum, AAC (mappings of museum data to CIDOC CRM),
Multisensor (NIF and FrameNet), EHRI (Holocaust Research into Jewish social networks), Duraspace (Portland Common Data Model for holding metadata in institutional repositories), Video annotation.
If the example instances include embedded source field names, they can describe a mapping precisely. I’ve implemented a few more tools to generate transformations:
- rdf2rml generates R2RML transformations for RDBMS tables or SQL queries. Compared to R2RML, this saves about 15x in complexity and is competitive with the dedicated DSL YARRML
- rdf2sparql generates OntoRefine or TARQL transformations from CSV/TSV
that take the form of SPARQL UPDATE (for direct GraphDB loading)
or CONSTRUCT (for conversion to RDF).
(Subsumes two deprecated tools
rdf2tarql
andrdf2ontorefine
)
See http://twitter.com/hashtag/rdfpuml for news, diagrams and announcements.
If you use this software, please cite it as shown above.
- Github shows a link “About> Cite this repository” (see about-citation-files)
- CITATION.cff describes both the software and the above presentation. It’s a YAML CFF file, see https://citation-file-format.github.io/
- CITATION.bib describes only the above presentation. It’s a bibtex file
- http://rawgit2.com/VladimirAlexiev/rdf2rml/master/doc/rdfpuml.html
- http://rawgit2.com/VladimirAlexiev/rdf2rml/master/doc/rdf2rml.html
- http://rawgit2.com/VladimirAlexiev/rdf2rml/master/doc/rdf2sparql.html
(subsumes
rdf2tarql
andrdf2ontorefine
)
Documentation source:
- ./doc/rdfpuml.pod
- ./doc/rdf2rml.pod
- ./doc/rdf2sparql.pod
(subsumes
rdf2tarql
andrdf2ontorefine
)
The following works use or mention this software:
- V. Alexiev, A. Kiryakov, P. Tarkalanov (2017) euBusinessGraph: Company and economic data for innovative products and services. 13th International Conference on Semantic Systems (Semantics 2017)
- L. Zhuhadar, M. Ciampa (2017). Leveraging learning innovations in cognitive computing with massive data sets: Using the offshore Panama papers leak to discover patterns. Computers in Human Behavior. doi:10.1016/j.chb.2017.12.013
- C. Debruyne, D. Lewis, D. O’Sullivan (October 2018). Generating Executable Mappings from RDF Data Cube Data Structure Definitions. In Confederated International Conferences “On the Move to Meaningful Internet Systems” (OTM 2018), pages 333-350. doi:10.1007/978-3-030-02671-4_21
- V. Alexiev (2018). Museum Linked Open Data: Ontologies, Datasets, Projects (invited report). In Digital Presentation and Preservation of Cultural and Scientific Heritage (DIPP 2018). Volume 8, pages 19-50. Burgas, Bulgaria, September 2018
- A.D. Junior (2019). A Jigsaw Puzzle Metaphor for Representing Linked Data Mappings. PhD Thesis, Knowledge and Data Engineering Group (KDEG), Trinity College, Dublin, Ireland
- V. Alexiev, P. Tarkalanov, N. Georgiev, L. Pavlova (2020). Bulgarian Icons in Wikidata and EDM. Digital Presentation and Preservation of Cultural and Scientific Heritage (DIPP 2020).
- Matjaz Rihtar. https://github.com/mrihtar/rdfgraph:
inspired by
rdfpuml
, written in Python 2.7, uses Redland’slibrdf
library. I worked with Matjaz in the euBusinessGraph project.
Checkout this repo and add rdf2rml/bin
to your path.
Install the following prerequisites:
- both tools: Perl. Tested with version 5.22 on Windows (cygwin and Strawberry).
- rdfpuml:
- GraphViz
- PlantUML. You need a recent version for new features like arrow length and color. I’m currently running 1.2018.10beta7. See in particular plantuml class diagrams.
- Perl modules: use
cpan
orcpanm
to install them:RDF::Trine RDF::Query Encode FindBin Carp::Always Slurp
RDF::Prefixes::Curie
. This is my own module located in ./lib, and rdfpuml needsFindBin
to locate it.
- rdf2rml:
- Apache Jena:
riot
,update
. Tested with version 3.1.0 of 2016-05-10. - cat, grep, rm
- Apache Jena:
Jonas Smedegaard (@jonassmedegaard, dr at jones fullstop dk) has volunteered for some of the tasks below. His development is at https://salsa.debian.org/debian/rdf2rml/branches. To adopt changes, do something like this.
- To merge all commits in the
salsa/develop
branch:cd rdf2rml # i.e. your local clone of your Github project git remote add salsa https://salsa.debian.org/debian/rdf2rml.git git fetch salsa git merge salsa/develop
- To adopt only single commits from the
salsa/develop
branch, issueremote
andfetch
as above, then issue:git cherry-pick $commit1 $commit2 $commit3
- Support “Conditional Nodes”, i.e. URLs that are conditional on the existence of some fields.
- issue 21: Round brackets in fields (eg
"(name)"
) and URLs (eg<type/(type)>
) are not mangled to square brackets anymore
- issue 18 Add
puml:option
forleft to right direction
etc
- issue 19 Implement filter function, see
test/filter-content
- issue 20 Allow dynamic graph (computed from a data column), see
test/graphs-crunchbase
Datatype attachment eg strdt(?var,xsd:date)
now outputs to ?var_xsd_date
to avoid conflict with input field names in ALL_UPPERCASE
- Handle blank-node types that occur on owl:Restriction (see VladimirAlexiev#10 and
test/blank-node
) - Duplicate
rdfpuml.bat, puml.bat
as shell scriptsrdfpuml, puml
for use in Makefiles across Linux and Windows
Merge rdf2tarql
and rdf2ontorefine
to one tool rdf2sparql
Add script to generate OntoRefine SPARQL Update queries from model.
Use Perl option -C
when invoking for proper Unicode processing.
See doc section rdfpuml.html#Unicode
Use URL for logicalTable instead of blank node, so that R2RML generated from different models for different tables can be merged more easily. Warning: this assumes that all instances of one subjectMap use the same query.
Add rdf2tarql.pl script to generate TARQL script (CSV-RDF conversion) from model.
- Improve script to abort if the first pipeline step (“update”) fails
- Improve script to work on Cygwin (invokes the Jena tools as
riot.bat
andupdate.bat
) - Filter out harmless warnings from Jena update’s error log
for datatypes like
xsd:integer, xsd:date
etc since the mention of a source field doesn’t match the syntax of such literals. - If a node has single outgoing link and no SQL query/table (
puml:label
), propagate that property backward across the link into the node (previously that was done only for incoming links)
When an edge Y-P-X
is recorded in the RDB table of X
(as foreign key) or in an association table,
it is awkward to specify that table in the node Y
.
So I added this SPARQL UPDATE clause:
- If a node ?y has no SQL, is not Inlined, has a single outgoing edge, then add the SQL of its counterparty ?x as default
I often define puml:stereotype
for some classes in prefixes.ttl.
If the class is not used in some particular turtle, it should avoid emitting a disconnected puml class.
- `stereotypes`: Avoid emitting
- `has_statements_different_from`: Check that a node has statements other than puml:stereotype
When a type is also used with puml:InlineProperty, it caused this error:
Can't locate object method "uri_value" via package "RDF::Trine::Node::Literal" at rdfpuml.pl line 261. main::puml_qname(RDF::Trine::Node::Literal=ARRAY(0x4fd0920)) called at rdfpuml.pl line 279 main::puml_node2(RDF::Trine::Node::Literal=ARRAY(0x4fd0920)) called at rdfpuml.pl line 128
An inline is converted to a literal, but rdf:type is always assumed to be a URL.
Test: ./test/regression/type-inlineProperty.ttl
Add arrow attributes (dotted, dashed, bold) and length
Test: ./test/regression/arrowLen.ttl
Help needed for the following tasks. Post bugs and enhancement requests to this repo!
There’s a pull request VladimirAlexiev#7 that dockerizes the installation. As of 18-Sep-2019 it’s undergoing code review.
A docker image exits on the public Nexus of Ontotext. To run type:
`docker run -v <your directory>:/files –rm docker-registry.ontotext.com/rdf2rml:latest`
where `<your directory>` is a local directory holding your `.ttl` files.
**2023-05-19**
Uses the following version
- rdf2rml: commit [d4e97d5](https://github.com/VladimirAlexiev/rdf2rml/commit/d4e97d5dd29da45fb97b685d824f2906ba973722)
- plantuml: [1.2023.7](https://plantuml.com/download)
- jena: [4.8.0](https://jena.apache.org/download/)
sort
is added at various places to make the tool more deterministic, i.e. independent of order of RDF statements in the input file. However, this will interfere with the ability to control the layout, especially of disconnected components (see layout_new_line)- Some regression tests are added.
In the case Y-P-X
described above:
- Also need to record
?y puml:property ?p
so this prop name can be added to ?y’s subject map - When making ?map, take
puml:property
into account - But ?map is made many times, and copy-paste is no good…
- Also, this should be done in some cases but not others…
- So it’s better to record
?y puml:map ?map
…
Add ttl with non-ASCII chars: Accented, Cyrillic, French, etc.
- Accented:
"Rudolf Mössbauer"
in ./test/TRR/societyMember.ttl
./lib/RDF/Prefixes/Curie.pm remembers @base
and uses that for URL shortening.
Once perlrdf#131 is fixed, eliminate this dependency (local module)
rdfpuml
shortens URLs using prefixes only from prefixes.ttl
, but should also use prefixes defined in the individual input file.
Now it only supports Turtle, because it concatenates prefixes.ttl
to the main file.
If it can collect all prefixes from RDF files, such concatenation won’t be needed
Issue #1: plantuml is slow to start up, so we’d like to process a bunch of puml
files at once.
The best way is to have a smarter script or Makefile
that uses the following http://plantuml.com/command-line features:
- Keep the intermediate
puml
files (the currentMakefile
doesn’t preserve them) - Run
plantuml
on a whole folder (with-r[ecurse]
it can even recurse through subfolders) - Use
-checkmetadata
to skippng
files that don’t need to be regenerated. (The wholepuml
text is stored in thepng
, soplantuml
can quickly check that there are no changes) - The
Makefile
should startplantuml
only once, if some of thepuml
files is newer than its respectivepng
file
Before I discovered the -checkmetadata
option,
I had the idea that rdfpuml
could put several diagrams in one puml
file:
@startuml file1.png # made from file1.ttl @enduml @startuml file2.png # made from file2.ttl @enduml
However, this interferes with make
processing that regenerates only png
for changed ttl
files,
and makes things less modular overall.
Trine (Perl RDF) is end of life. Attean is the new generation
Write Turtle, see diagram (easy to do)
See ./ideas
- See
arrows arrows-2
from https://github.com/anoff/blog/tree/master/static/assets/plantuml/diagrams:
- Arrow styles and colors (bold, dashed etc): https://mrhaki.blogspot.com/2016/12/plantuml-pleasantness-get-plantuml.html
plantuml -pattern
regexes:
dotted|dashed|plain|bold|hidden|norank|single|thickness
Local layout options are described in Help on Layout:
- “hidden” makes a constraint between two nodes, but does not draw the link (
rdfpuml
already implements this) - norank ignores a link for layout purposes (same as graphviz
constraint=false
) - “together” groups classes as if they were in the same package (i.e. puts them in a graphviz cluster)
Global options include (eg see this diagram):
And there are a lot more undocumented features: https://forum.plantuml.net/7095
Ability to describe custom reification situations using the Property Reification Vocabulary (PRV)
Plantuml now has MindMap and WBS (or OBS) diagrams that use a simple bulleted syntax to draw hierarchies.
It would be nice to use this to draw hierarchies of individuals, in particular taxonomies.
Here are examples of the two styles:
A new tool rdf2soml
to generate Ontotext Platform SOML from RDF examples.
What’s missing? Most importantly: property cardinality and virtual inverses.
PlantUML can show arrow cardinalities, and this simple and natural PlantUML code:
X "0:1" -left-> "1:m" Y : prop/\ninvProp
Is depicted as follows:
We have two options how to express this in triples:
##### model triples
:X :prop :Y.
##### puml triples
<< :X :prop :Y >>
puml:arrow puml:left; # direction
puml:min 1; puml:max puml:inf; # cardinality
puml:inverseAlias [puml:min 0; puml:max 1; puml:name "invProp"]. # virtual inverse
- Pros: very natural
- Cons:
- Perl RDF doesn’t support RDF*, and few editors support it either.
- Annotating a triple does not assert it, so we need to assert it as well
##### model triples
:X :prop :Y.
##### puml triples
:X puml:left :Y. # direction
:X :prop [ # a puml:Cardinality; # may need this marker class to skip the node from the diagram
puml:min 1; puml:max puml:inf; # cardinality
puml:object :Y; # only needed if X has several relations "prop" and they need different annotations
puml:inverseAlias [puml:min 0; puml:max 1; puml:name "invProp"] # virtual inverse
].
Issue #8: discussion with Thomas Francart of Sparna
I developed this SHACL to PlantUML converter, in Java, based on TopQuadrant SHACL lib, and the result is at https://shacl-play.sparna.fr/play/draw and code at https://github.com/sparna-git/shacl-play/tree/master/shacl-diagram
I don’t have a strong opinion on the example you provide, an alternative idea that comes to my mind is
:node1 :link [
rdf:value :node2;
puml:min 1 ;
puml:max 2 ;
]
But this changes the structure of the example graph itself, which might not be convenient
R2RML works great for RDBMS, but how about other sources? Extend rdf2rml to generate: