Skip to content

A tool that defines mapping between RO-Crate and RDA maDMP standards and allows generating RO-Crate(s) from maDMP and vice versa.

License

Notifications You must be signed in to change notification settings

GhaithArf/ro-crate-rda-madmp-mapper

Repository files navigation

RO-Crate RDA maDMP Mapper

DOI License: MIT

Ro-Crate RDA maDMP Mapper is a tool that defines mapping between RO-Crate and RDA maDMP. It allows generating RO-Crate(s) from maDMP (1 to many) and maDMP from RO-Crate(s) (many to 1).

Mapping

Assumptions

DMPs can include multiple datasets. However, ro-crate include only one dataset. The dataset can include other nested datasets. It is assumed that most of the data for the dataset at the root (e.g. funding, contributors, authors,...) is equivalent to the data of DMP. Nested datasets will be included as elements of distributions.

RO-crates have to have @id for each property. Otherwise, the jsonld generated would have a wrong format. However, an identifier is not always present for entities of DMP. In case of absence of identifiers, it is assumed that the title is also the @id.

The website of ro-crate explicitly mentions that it is possible to use schema.org metadata to supplement RO-Crate. So, if there are attributes which are present in DMP and missing in ro-crates, they are accounted for by other Linked Data Vocabularies. (Example: cost)

Mapped Properties

ro-crate attribute @type of ro-crate property DMP attribute Parent of DMP attribute Comment/Assumption
contactPoint ContactPoint contact dmp It is assumed that the contact person for the DMP is the same as the contact person of the dataset.
@id ContactPoint identifier contact_id
email ContactPoint mbox contact
name ContactPoint name contact
contactType ContactPoint type contact These attributes are not exactly equivalent. But, they are close enough.
author/creator Dataset contributor dmp
@id person identifier contributor_id
affiliation person role contributor
name person name contributor
email person mbox contributor
@id person identifier contributor_id
cost Dataset cost dmp In DMP, the cost represents a list of costs related to data management. However, the cost for ro-crate may not include all costs.
costCurrency cost currency_code cost This is not explicitly mentioned in ro-crate website. But, cost properties can be found in jsonld context used for ro-crates.
description cost description cost
@id cost title cost
value cost value cost
Language Dataset language dmp
description Dataset description dataset
name Dataset title dataset
identifier Dataset identifier dataset_id
Dataset hasPart distribution dataset Ro-crates usually include a lot of sub-datasets. They usually include information about the main dataset like encoding and size. Therefore, they are accounted for as distributions.
File hasPart distribution dataset Ro-crates usually include a lot of sub-Files. Sub-datasets have sub-Files. Deeply nested files will not be taken into consideration. This is because it does not aline with the concept of distribution.
downloadUrl DataDownload download_url distribution
contentUrl DataDownload access_url distribution
endDate DataDownload available_until distribution
contentSize Dataset byte_size distribution
contentSize Dataset/File byte_size distribution
encodingFormat Dataset/File format distribution
name contentLocation geo_location host It is assumed that the name is a country. But, it is not always the case. But, it is better than losing th e information.
title RepositoryObject title host
description RepositoryObject description host
@id RepositoryObject url host
availability RepositoryObject availability host availability is not defined exactly the same for ro-crate and DMP in terms of the format of the inputed value.
license Dataset/File license distribution
@id license license_ref license
identifier license license_ref license
datePublished Dataset issued dataset
keywords Dataset keyword dataset
description Organisation description project
name Organisation title project
name Organisation title project
endDate Organisation end project
startDate Organisation start project
@id Grant identifier funder_id
@id CreativeWork identifier metadata_standard_id
Language CreativeWork language dmp metadata
description CreativeWork description dmp metadata

Unmapped Properties of DMP

DMP attribute Comment
dmp There is no direct mapping to dmp since ro-crate is an approach to package research data with their metadata. DMP concept considers the bigger picture.
contact_id, contributor_id, dmp_id, dataset_id, hasPart, host, project, metadata_standard_id, funder_id For these attributes, there isn't an equivalent attribute. But, their children have equivalent attributes. The equivalence is not needed and it is not classified as missing.
type "type" of "contact_id" and "contributor_id" have different meaning than "@type". There isn't an equivalent.
created The DMP creation should be at the time the DMP is generated. Therefore, it does not have an equivalent in ro-crates.
modified DMP creation date is the same as the modified date in this case.
personal_data, preservation_statement, preservation_statement, security_and_privacy, data_access, storage_type, pid_system, certified_with, backup_type, backup__frequency, ethical_issues_description, ethical_issues_exist, ethical_issues_report, data_quality_assurance, sensitive_data Almost all attributes which have to do with quality, privacy, ethics and security are missing in ro-crate and connot be translated. PS: accessMode in https://w3id.org/ro/crate/1.0/context is different from data_access.
support_versioning This specific attribute has no equivalent in the context. However, there is the possibility to include the specific versions with other attributes.
funding_status The definition provided in DMP official description does not match any of the descriptions for the ro-crate.
identifier, description, title These attributes have to be inputed by the user. It is wrong to assume that the DMP's identifier is the same as the dataset's identifier. The user should create a new identifier for the DMP. It is also not always logical to give the DMP the name of the dataset.

Unmapped Properties of ro-crate

ro-crate attribute Comment
@context, @graph, conformsTo, about There isn't an equivalent for these attributes because they are specific to jsonld. However, DMP does not follow jsonld schema. It follows json schema. When automatically generating a ro-crate, they are inserted as they can be programatically determined. When generating DMPs, they are ignored.
hasPart, hasMember DMPs do not have the equivalent of these attributes. However, they are accounted for by including their values in distributions.
publisher, sameAs, temporalCoverage These attributes are not included as explicitly in DMPs.
CreateAction, UpdateAction CreateAction and UpdateAction classes are there to model the contributions of Context Entities of type Person or Organization. This is not present in DMPs.
latitude, longitude Places are described more thouroughly in ro-crates.

Usage

In order to map between both standards, the following command should be used:

python mapper.py -i <input_path> -o <output_path>

Where input_path represents a path to the input folder (or file) representing an maDMP or a RO-Crate project and output_path represents a path where the generated files will be stored.

It is not required to explicitally define the mapping direction (maDMP to RO-Crate or RO-Crate to maDMP) since this is handled within the tool. x

Examples

Several maDMP and RO-Crate examples are also provided. They are structured in the following way:


├── madmp
│   ├── calculation-of-nice-sunny-days
│   ├── closed
│   ├── dataset-many
│   ├── funded-project
│   ├── life-expectancy-prediction
│   ├── long
│   ├── minimal-content
│   ├── multilayer-perceptron-on-hypothyroid
│   ├── swedish-motor-insurance
│   └── world-development-indicators
└── rocrate
    ├── drug_consumption
    ├── Glop_Pot
    ├── GTM
    ├── NursingResidentStuff
    └── world_development_indicators_visualization

For maDMPs, 5 of the examples are taken from examples made by students as part of Data Stewardship at Vienna University of Technology. The other 5 are taken from the examples provided by the RDA-DMP-Common-Standard.

For RO-Crates, 3 examples are taken from the official RO-Crate website and the other 2 are especially created to test the coverage of the mapping using existing datasets.

Demo

A demo of the mapping between both standards can be executed using the following command:

python demo.py

This demo uses the content examples folder in order to generate mapped files and store them in demo_files.

Running tests

In order to run the unit tests stored under tests, pytest can be used. To do so, navigate to tests folder and run the following command pytest -vv. Unit tests are made to test the functionality of some methods within the different modules and are not testing the mapping functionality.

Contributors

Ghaith Arfaoui Ghaith Arfaoui

Maroua Jaoua Maroua Jaoua

License

MIT License

About

A tool that defines mapping between RO-Crate and RDA maDMP standards and allows generating RO-Crate(s) from maDMP and vice versa.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages