Skip to content

Commit

Permalink
docs/guides/metadata.md: Copy edit, add and remove content
Browse files Browse the repository at this point in the history
  • Loading branch information
duncandewhurst committed Sep 4, 2023
1 parent a28eadb commit 8c4ab6e
Showing 1 changed file with 34 additions and 40 deletions.
74 changes: 34 additions & 40 deletions docs/guides/metadata.md
Original file line number Diff line number Diff line change
@@ -1,74 +1,68 @@
# How to publish RDLS metadata

Metadata enables datasets to be found by human and machine searches, and so users can easily identify the dataset contents. It is strongly encouraged that any risk dataset being uploaded online has metadata prepared and uploaded with it.

This page provides an overview of the process for publishing Risk Data Library Standard (RDLS) metadata and how-to guides for specific topics.
This page provides an [overview](#overview) of the process for publishing Risk Data Library Standard (RDLS) metadata and [how-to guides](#how-to-guides) for specific topics.

## Overview

The process for publishing RDLS metadata can be divided into three phases:

- [Prepare your metadata](#prepare-your-metadata)
- [Check your metadata](#check-your-metadata)
- [Convert and validate your metadata](#convert-and-validate-your-metadata)
- [Publish your metadata](#publish-your-metadata)

### Prepare your metadata

You can prepare RDLS metadata in either spreadsheet format or JSON format.
Data catalog systems typically use [Java Script Object Notation (JSON)](https://www.json.org/) as a data-interchange format so your likely goal is to publish RDLS metadata in JSON format.

Whilst you can *author* RDLS metadata in JSON format, it is difficult and time consuming to author JSON data 'by hand'. Therefore, we provide open source tools that you can use to author RDLS metadata in a more user-friendly spreadsheet format and to convert it to JSON format.

If you are authoring new metadata by hand or converting existing metadata from a spreadsheet, the suggested approach is to [use the RDLS spreadsheet template](#use-the-rdls-spreadsheet-template).

If you are exporting existing metadata from a data catalog or database and you have access to a software developer, the suggested approach is to [export data in JSON format](#export-data-in-json-format).

If your risk datasets use terms from existing taxonomies or classifications, use the [taxonomy mappings](mappings/index.md) to identify the equivalent codes in RDLS.

#### Use the RDLS spreadsheet template

The RDLS spreadsheet template is a tool to enable publishers to create RDLS metadata in Excel (.xslx) format. The spreadsheet is generated directly from the RDLS JSON schema and can be converted back into JSON format for validation and publication using tools such as the Risk Data Library metadata toolkit.
If you plan to describe the spatial coverage of your risk data using coordinates, you may need to [transform your coordinates to the correct coordinate reference system](#transform-coordinates-between-coordinate-reference-systems).

Guidance on how to use the spreadsheet template can be found in the [README](https://github.com/GFDRR/rdls-spreadsheet-template#readme) section of the spreadsheet template Github repository.
#### Use the RDLS spreadsheet template

#### Export data in JSON format
The RDLS spreadsheet template is a tool for authoring RDLS metadata in spreadsheet format.

The JSON format reflects the structure of the schema, is useful to developers who want to use the data to build web apps, and offers a ‘base’ format that other publication formats can be converted to and from. The JSON format of the standard is at the heart of the RDLS Data Review Toolkit. You should author JSON in the tool of your choice and then use the RDLS Data Review Toolkit to validate your .json files against the RDLS.
To download the template and learn how to use it, read its [documentation](https://github.com/GFDRR/rdls-spreadsheet-template#readme).

### Check your metadata
Once you have entered your metadata using the template, the next step is to [convert it to JSON format and validate it against the RDLS schema](#convert-and-validate-your-metadata).

You ought to regularly use the RDLS Data Review Tool to check the structure and format of your metadata as you generate it. This will help ensure that your metadata is compatible with RDLS tools and is comparable with other RDLS metadata.
#### Export data in JSON format

If your metadata is in JSON format, you need to [package your RDLS metadata](#package-your-rdls-metadata) before submitting it to the RDLS Data Review Tool.
If you plan to export RDLS metadata from an existing system in JSON format, you first need to identify how your existing metadata 'maps' to RDLS - that is, identifying which [data elements](https://en.wikipedia.org/wiki/Data_element) within your system match which RDLS [fields](../reference/schema.md) and [codes](../reference/codelists.md). You then need to implement your mapping in code. JSON is a widely used format so most programming languages and database engines provide support for exporting data in JSON format.

The Data Review Tool reports any structural issues with your metadata. It validates your metadata against the RDLS schema, checking whether your metadata makes sense and appears in the correct place within the schema.
It is strongly suggested that you do not author RDLS metadata in JSON format 'by hand'. However, if you do choose this approach, you should use a text editor with support for JSON formatting and validation, such as [Visual Studio Code](https://code.visualstudio.com/docs/languages/json).

You ought to use real data for testing, wherever possible. Using fictional data can lead to false positives and missed errors in your data pipeline.
If you don’t yet have enough real data to generate all the necessary metadata, for example the dataset hasn’t been published yet so you don’t have any resource url’s, or the hazard event is ongoing and therefore an end data is not yet available, you should try to collect enough real data for at least one dataset with at least one resource.
In either case, you need to structure and format your data according to the [RDLS schema](../reference/schema.md).

If you can't collect enough real data for testing, then you ought to create realistic and coherent test data:
Once you have prepared your RDLS metadata in JSON format, the next step is to [validate it against the RDLS schema](#convert-and-validate-your-metadata).

- use real hazards and locations
- use plausible dates and values
- avoid using placeholder values
- avoid setting multiple data elements to the same value.
### Convert and validate your metadata

**Action**: Upload some data to the RDLS Data Review Tool.
The [RDLS Convertor](https://metadata.riskdatalibrary.org) is a web-based tool for converting RDLS metadata between spreadsheet and JSON format and for validating it against the RDLS schema. You can submit data to the convertor in either spreadsheet or JSON format.

### Publish your metadata
You ought to regularly use the RDLS Convertor to validate the structure and format of your metadata. This ensures that your metadata is compatible with tools designed to work with RDLS metadata.

#### Publish to an open data catalog
If your metadata is in JSON format, you need to [package your RDLS metadata](#package-your-rdls-metadata) before submitting it to the RDLS Convertor.

##### Access-restricted data
The RDLS Convertor reports any issues with the structure and format of your metadata. You ought to fix the issues it reports before publishing your metadata.

RDLS metadata may be produced to describe data that is access-restricted. These metadata can still be published to an open data catalog however to advertise the existence of the restricted data.
If you prefer to use command-line tools, you can use [Flatten Tool](https://flatten-tool.readthedocs.io/) to convert RDLS metadata between spreadsheet and JSON format and you can use [Lib CoVE RDLS](https://github.com/GFDRR/rdls-lib-cove) to validate your metadata against the RDLS schema.

If there is an unique non-access restricted url for the resource being described this should be given as the `download_url`. If, however, the unique resource url will automatically redirect to e.g. a generic landing page, an access request page or a restriction warning page for users without access rights, this should instead be given as the `access_url`.
Once you've resolved any issues with the structure and format of your data, the next step is to [publish it](#publish-your-metadata).

#### Publish to an internal or access-restricted catalog
### Publish your metadata

RDLS was designed for data that would be openly published, however it is also suitable for access-restricted data catalogs such as commercial data products or internal catalogs for individual institutions.
The steps involved in publishing your RDLS metadata will depend on the specific data catalog or website to which you are adding your risk datasets.

If the users who will have access to the catalog will have the same access rights to the datasets being described, you do not need to take any additional steps in preparing your metadata.
If you are adding data to the World Bank Data Catalog, refer to the [internal guidance for World Bank users](https://github.com/GFDRR/rdl-standard/blob/dev/internal_guide_rdl_on_WBdataCatalog.md).

If the users who will have access to the catalog will not necessarily have the same access rights to the datasets being described, follow the additional guidance for publishing [access-restricted data](#access-restricted-data) to an open data catalog.
If you are publishing an access-restricted resource, see [how to publish an access-restricted resource](#publish-an-access-restricted-resource).

## How-to guides

Expand Down Expand Up @@ -109,19 +103,15 @@ If you are writing your own software or if you prefer to use the command line, s

If you prefer to use a graphical user interface, several web-based tools are available, for example [Online UUID Generator](https://www.uuidgenerator.net/).

### Declare the version of RDLS schema that describes your metadata

To publish RDLS metadata you must declare the version of the RDLS schema used. By declaring the version of the schema used, validation tools can correctly validate the metadata, and users know which schema to refer to when interpreting the metadata.

The `Link` object provides the means to declare the schema version used. Both the JSON schema and the spreadsheet template will automatically populate the `links` array with a `Link` object that provides the canonical url of the current version of the RDLS schema (`href`) and the appropriate [IANA](https://www.iana.org/assignments/link-relations/link-relations.xhtml) code to describe the link between this url and the RDLS metadata being published (`rel`).

### Package your RDLS metadata

RDLS metadata in JSON format must be packaged within a container object prior to publication. A simple [package schema](../reference/package_schema.md) is provided. The package schema can contain RDLS metadata for multiple datasets.
To package your RDLS metadata, use the structure and format described by the [package schema](../reference/package_schema.md).

### Transform coordinates between coordinate reference systems

Within your RDLS metadata, you can specify the coordinates of each resource using the `bbox`, `geometry` and `centroid` fields within the `Location` object. All coordinates must be given in the [WGS84 coordinate reference system](https://datatracker.ietf.org/doc/html/rfc7946#section-4) (CRS) as required by GeoJSON. If the coordinates in your data sources are specified in a different CRS, before publishing your RDLS metadata, you first need to transform the coordinates to the correct CRS.
Coordinates in RDLS metadata need to be specified using the World Geodetic System 1984 (WGS 84) datum, with longitude and latitude units of decimal degrees. This is equivalent to the coordinate reference system identified by the Open Geospatial Consortium URN urn:ogc:def:crs:OGC::CRS84.

If the coordinates in your data sources are specified in a different CRS, before publishing your RDLS metadata, you first need to transform the coordinates to the correct CRS.

If your data pipeline includes a Geographic Information System such as ArcGIS or QGIS, these tools can transform coordinates from one CRS to another. If you are writing your own software, or if you prefer to use the command line, several libraries and tools are available, for example:

Expand All @@ -135,3 +125,7 @@ If you prefer to use a graphical user interface, several web-based tools are ava
- [epsg.io](https://epsg.io/transform)

The WSG84 CRS is equivalent to EPSG:4326 with reversed axes so, if it is not supported by your chosen transformation tool, you can instead transform your coordinates to EPSG:4326 and manually order your coordinates in longitude, latitude order.

### Publish an access-restricted resource

If a resource is not available directly from a non-access-restricted URL, you ought to publish the URL of the page that describes the arrangements for obtaining access to the resource in the [`Resource.access_url`](rdls_schema.json,/$defs/Resource,access_url) field.

0 comments on commit 8c4ab6e

Please sign in to comment.