Skip to content

Commit

Permalink
Merge remote-tracking branch 'IQSS/develop' into IQSS/9185-contact_em…
Browse files Browse the repository at this point in the history
…ail_updates
  • Loading branch information
qqmyers committed Apr 24, 2023
2 parents eb0a446 + e4961fe commit c982ec7
Show file tree
Hide file tree
Showing 89 changed files with 2,437 additions and 961 deletions.
1 change: 1 addition & 0 deletions doc/release-notes/3913-delete-file-endpoint
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Support for deleting files using native API: http://preview.guides.gdcc.io/en/develop/api/native-api.html#deleting-files
2 changes: 1 addition & 1 deletion doc/release-notes/5.13-release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ See [Metadata Blocks](https://guides.dataverse.org/en/5.13/api/native-api.html#m

### Advanced Database Settings

You can now enable advanced database connection pool configurations useful for debugging and monitoring as well as other settings. Of particular interest may be `sslmode=require`. See the new [Database Persistence](https://guides.dataverse.org/en/5.13/installation/config.html#database-persistence) section of the Installation Guide for details. (PR #8915)
You can now enable advanced database connection pool configurations useful for debugging and monitoring as well as other settings. Of particular interest may be `sslmode=require`, though installations already setting this parameter in the Postgres connection string will need to move it to `dataverse.db.parameters`. See the new [Database Persistence](https://guides.dataverse.org/en/5.13/installation/config.html#database-persistence) section of the Installation Guide for details. (PR #8915)

### Support for Cleaning up Leftover Files in Dataset Storage

Expand Down
30 changes: 30 additions & 0 deletions doc/release-notes/7000-pidproviders.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Changes to PID Provider JVM Settings

In prepration for a future feature to use multiple PID providers at the same time, all JVM settings for PID providers
have been enabled to be configured using MicroProfile Config. In the same go, they were renamed to match the name
of the provider to be configured.

Please watch your log files for deprecation warnings. Your old settings will be picked up, but you should migrate
to the new names to avoid unnecessary log clutter and get prepared for more future changes. An example message
looks like this:

```
[#|2023-03-31T16:55:27.992+0000|WARNING|Payara 5.2022.5|edu.harvard.iq.dataverse.settings.source.AliasConfigSource|_ThreadID=30;_ThreadName=RunLevelControllerThread-1680281704925;_TimeMillis=1680281727992;_LevelValue=900;|
Detected deprecated config option doi.username in use. Please update your config to use dataverse.pid.datacite.username.|#]
```

Here is a list of the new settings:

- dataverse.pid.datacite.mds-api-url
- dataverse.pid.datacite.rest-api-url
- dataverse.pid.datacite.username
- dataverse.pid.datacite.password
- dataverse.pid.handlenet.key.path
- dataverse.pid.handlenet.key.passphrase
- dataverse.pid.handlenet.index
- dataverse.pid.permalink.base-url
- dataverse.pid.ezid.api-url
- dataverse.pid.ezid.username
- dataverse.pid.ezid.password

See also http://preview.guides.gdcc.io/en/develop/installation/config.html#persistent-identifiers-and-publishing-datasets
8 changes: 8 additions & 0 deletions doc/release-notes/8424-signposting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Signposting for Dataverse

This release adds [Signposting](https://signposting.org/) support to Dataverse to improve machine discoverability of datasets and files.

The following MicroProfile Config options are now available (these can be treated as JVM options):

- dataverse.signposting.level1-author-limit
- dataverse.signposting.level1-item-limit
1 change: 1 addition & 0 deletions doc/release-notes/9063-session-api-auth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A feature flag called "api-session-auth" has been added temporarily to aid in the development of the new frontend (#9063) but will be removed once bearer tokens (#9229) have been implemented. There is a security risk (CSRF) in enabling this flag! Do not use it in production! For more information, see http://preview.guides.gdcc.io/en/develop/installation/config.html#feature-flags
1 change: 1 addition & 0 deletions doc/release-notes/9150-improved-external-vocab-supprt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
It is now possible to write external vocabulary scripts that target a single child field in a metadata block. Example scripts are now available at https://github.com/gdcc/dataverse-external-vocab-support that can be configured to support lookup from the Research Orgnaization Registry (ROR) for the Author Affiliation Field and for the CrossRef Funding Registry (Fundreg) in the Funding Information/Agency field, both in the standard Citation metadata block. Application if these scripts to other fields, and the development of other scripts targetting child fields are now possible.
3 changes: 3 additions & 0 deletions doc/release-notes/9277-nonstopmode-pdf-guides.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
An experimental version of the guides in PDF format is available at <http://preview.guides.gdcc.io/_/downloads/en/develop/pdf/>

Advice for contributors to documentation who want to help improve the PDF is available at http://preview.guides.gdcc.io/en/develop/developers/documentation.html#pdf-version-of-the-guides
7 changes: 7 additions & 0 deletions doc/release-notes/9374-binder-orig.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Files downloaded from Binder are now in their original format.

For example, data.dta (a Stata file) will be downloaded instead of data.tab (the archival version Dataverse creates as part of a successful ingest).

This should make it easier to write code to reproduce results as the dataset authors and subsequent researchers are likely operating on the original file format rather that the format that Dataverse creates.

For details, see #9374, <https://github.com/jupyterhub/repo2docker/issues/1242>, and <https://github.com/jupyterhub/repo2docker/pull/1253>.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Tool Type Scope Description
Data Explorer explore file A GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. See the README.md file at https://github.com/scholarsportal/dataverse-data-explorer-v2 for the instructions on adding Data Explorer to your Dataverse.
Whole Tale explore dataset A platform for the creation of reproducible research packages that allows users to launch containerized interactive analysis environments based on popular tools such as Jupyter and RStudio. Using this integration, Dataverse users can launch Jupyter and RStudio environments to analyze published datasets. For more information, see the `Whole Tale User Guide <https://wholetale.readthedocs.io/en/stable/users_guide/integration.html>`_.
Binder explore dataset Binder allows you to spin up custom computing environments in the cloud (including Jupyter notebooks) with the files from your dataset. `Installation instructions <https://github.com/data-exp-lab/girder_ythub/issues/10>`_ are in the Data Exploration Lab girder_ythub project.
Binder explore dataset Binder allows you to spin up custom computing environments in the cloud (including Jupyter notebooks) with the files from your dataset. `Installation instructions <https://github.com/data-exp-lab/girder_ythub/issues/10>`_ are in the Data Exploration Lab girder_ythub project. See also :ref:`binder`.
File Previewers explore file A set of tools that display the content of files - including audio, html, `Hypothes.is <https://hypothes.is/>`_ annotations, images, PDF, text, video, tabular data, spreadsheets, GeoJSON, zip, and NcML files - allowing them to be viewed without downloading the file. The previewers can be run directly from github.io, so the only required step is using the Dataverse API to register the ones you want to use. Documentation, including how to optionally brand the previewers, and an invitation to contribute through github are in the README.md file. Initial development was led by the Qualitative Data Repository and the spreasdheet previewer was added by the Social Sciences and Humanities Open Cloud (SSHOC) project. https://github.com/gdcc/dataverse-previewers
Data Curation Tool configure file A GUI for curating data by adding labels, groups, weights and other details to assist with informed reuse. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Curation-Tool for the installation instructions.
76 changes: 76 additions & 0 deletions doc/sphinx-guides/source/admin/discoverability.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
Discoverability
===============

Datasets are made discoverable by a variety of methods.

.. contents:: |toctitle|
:local:

DataCite Integration
--------------------

If you are using `DataCite <https://datacite.org>`_ as your DOI provider, when datasets are published, metadata is pushed to DataCite, where it can be searched. For more information, see :ref:`:DoiProvider` in the Installation Guide.

OAI-PMH (Harvesting)
--------------------

The Dataverse software supports a protocol called OAI-PMH that facilitates harvesting dataset metadata from one system into another. For details on harvesting, see the :doc:`harvestserver` section.

Machine-Readable Metadata on Dataset Landing Pages
--------------------------------------------------

As recommended in `A Data Citation Roadmap for Scholarly Data Repositories <https://doi.org/10.1101/097196>`_, the Dataverse software embeds metadata on dataset landing pages in a variety of machine-readable ways.

Dublin Core HTML Meta Tags
++++++++++++++++++++++++++

The HTML source of a dataset landing page includes "DC" (Dublin Core) ``<meta>`` tags such as the following::

<meta name="DC.identifier" content="..."
<meta name="DC.type" content="Dataset"
<meta name="DC.title" content="..."

Schema.org JSON-LD Metadata
+++++++++++++++++++++++++++

The HTML source of a dataset landing page includes Schema.org JSON-LD metadata like this::


<script type="application/ld+json">{"@context":"http://schema.org","@type":"Dataset","@id":"https://doi.org/...


.. _discovery-sign-posting:

Signposting
+++++++++++

The Dataverse software supports `Signposting <https://signposting.org>`_. This allows machines to request more information about a dataset through the `Link <https://tools.ietf.org/html/rfc5988>`_ HTTP header.

There are 2 Signposting profile levels, level 1 and level 2. In this implementation,
* Level 1 links are shown `as recommended <https://signposting.org/FAIR/>`_ in the "Link"
HTTP header, which can be fetched by sending an HTTP HEAD request, e.g. ``curl -I https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.5072/FK2/KPY4ZC``.
The number of author and file links in the level 1 header can be configured as described below.
* The level 2 linkset can be fetched by visiting the dedicated linkset page for
that artifact. The link can be seen in level 1 links with key name ``rel="linkset"``.

Note: Authors without author link will not be counted nor shown in any profile/linkset.
The following configuration options are available:

- :ref:`dataverse.signposting.level1-author-limit`

Sets the max number of authors to be shown in `level 1` profile.
If the number of authors (with identifier URLs) exceeds this value, no author links will be shown in `level 1` profile.
The default is 5.

- :ref:`dataverse.signposting.level1-item-limit`

Sets the max number of items/files which will be shown in `level 1` profile. Datasets with
too many files will not show any file links in `level 1` profile. They will be shown in `level 2` linkset only.
The default is 5.

See also :ref:`signposting-api` in the API Guide.

Additional Discoverability Through Integrations
-----------------------------------------------

See :ref:`integrations-discovery` in the Integrations section for additional discovery methods you can enable.
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/admin/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ This guide documents the functionality only available to superusers (such as "da

dashboard
external-tools
discoverability
harvestclients
harvestserver
metadatacustomization
Expand Down
21 changes: 12 additions & 9 deletions doc/sphinx-guides/source/admin/integrations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -147,21 +147,27 @@ Compute Button

The "Compute" button is still highly experimental and has special requirements such as use of a Swift object store, but it is documented under "Setting up Compute" in the :doc:`/installation/config` section of the Installation Guide.

.. _wholetale:

Whole Tale
++++++++++

`Whole Tale <https://wholetale.org>`_ enables researchers to analyze data using popular tools including Jupyter and RStudio with the ultimate goal of supporting publishing of reproducible research packages. Users can
`import data from a Dataverse installation
<https://wholetale.readthedocs.io/en/stable/users_guide/manage.html>`_ via identifier (e.g., DOI, URI, etc) or through the External Tools integration. For installation instructions, see the :doc:`external-tools` section or the `Integration <https://wholetale.readthedocs.io/en/stable/users_guide/integration.html#dataverse-external-tools>`_ section of the Whole Tale User Guide.

.. _binder:

Binder
++++++

Researchers can launch Jupyter Notebooks, RStudio, and other computational environments by entering the DOI of a dataset in a Dataverse installation on https://mybinder.org
Researchers can launch Jupyter Notebooks, RStudio, and other computational environments by entering the DOI of a dataset in a Dataverse installation at https://mybinder.org

A Binder button can also be added to every dataset page to launch Binder from there. Instructions on enabling this feature can be found under :doc:`external-tools`.

A Binder button can also be added to every dataset page to launch Binder from there. See :doc:`external-tools`.
Additionally, institutions can self host `BinderHub <https://binderhub.readthedocs.io/en/latest/>`_ (the software that powers mybinder.org), which lists the Dataverse software as one of the supported `repository providers <https://binderhub.readthedocs.io/en/latest/developer/repoproviders.html#supported-repoproviders>`_.

Institutions can self host BinderHub. The Dataverse Project is one of the supported `repository providers <https://binderhub.readthedocs.io/en/latest/developer/repoproviders.html#supported-repoproviders>`_.
.. _renku:

Renku
+++++
Expand All @@ -179,15 +185,12 @@ Avgidea Data Search

Researchers can use a Google Sheets add-on to search for Dataverse installation's CSV data and then import that data into a sheet. See `Avgidea Data Search <https://www.avgidea.io/avgidea-data-platform.html>`_ for details.

.. _integrations-discovery:

Discoverability
---------------

Integration with `DataCite <https://datacite.org>`_ is built in to the Dataverse Software. When datasets are published, metadata is sent to DataCite. You can further increase the discoverability of your datasets by setting up additional integrations.

OAI-PMH (Harvesting)
++++++++++++++++++++

The Dataverse Software supports a protocol called OAI-PMH that facilitates harvesting datasets from one system into another. For details on harvesting, see the :doc:`harvestserver` section.
A number of builtin features related to data discovery are listed under :doc:`discoverability` but you can further increase the discoverability of your data by setting up integrations.

SHARE
+++++
Expand Down
4 changes: 3 additions & 1 deletion doc/sphinx-guides/source/admin/make-data-count.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,9 @@ Configuring Your Dataverse Installation for Make Data Count Citations

Please note: as explained in the note above about limitations, this feature is not available to Dataverse installations that use Handles.

To configure your Dataverse installation to pull citations from the test vs. production DataCite server see :ref:`doi.dataciterestapiurlstring` in the Installation Guide.
To configure your Dataverse installation to pull citations from the test vs.
production DataCite server see :ref:`dataverse.pid.datacite.rest-api-url` in
the Installation Guide.

Please note that in the curl example, Bash environment variables are used with the idea that you can set a few environment variables and copy and paste the examples as is. For example, "$DOI" could become "doi:10.5072/FK2/BL2IBM" by issuing the following export command from Bash:

Expand Down
75 changes: 73 additions & 2 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2084,10 +2084,34 @@ The response is a JSON object described in the :doc:`/api/external-tools` sectio
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export VERSION=1.0
export TOOL_ID=1
curl -H "X-Dataverse-key: $API_TOKEN" -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/toolparams/$TOOL_ID?persistentId=$PERSISTENT_IDENTIFIER"
.. _signposting-api:

Retrieve Signposting Information
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Dataverse supports :ref:`discovery-sign-posting` as a discovery mechanism.
Signposting involves the addition of a `Link <https://tools.ietf.org/html/rfc5988>`__ HTTP header providing summary information on GET and HEAD requests to retrieve the dataset page and a separate /linkset API call to retrieve additional information.

Here is an example of a "Link" header:

``Link: <https://doi.org/10.5072/FK2/YD5QDG>;rel="cite-as", <https://doi.org/10.5072/FK2/YD5QDG>;rel="describedby";type="application/vnd.citationstyles.csl+json",<https://demo.dataverse.org/api/datasets/export?exporter=schema.org&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/json+ld", <https://schema.org/AboutPage>;rel="type",<https://schema.org/Dataset>;rel="type", https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.5072/FK2/YD5QDG;rel="license", <https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/linkset?persistentId=doi:10.5072/FK2/YD5QDG> ; rel="linkset";type="application/linkset+json"``

The URL for linkset information is discoverable under the ``rel="linkset";type="application/linkset+json`` entry in the "Link" header, such as in the example above.

The reponse includes a JSON object conforming to the `Signposting <https://signposting.org>`__ specification.
Signposting is not supported for draft dataset versions.

.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
export VERSION=1.0
curl -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/linkset?persistentId=$PERSISTENT_IDENTIFIER"
Files
-----

Expand Down Expand Up @@ -2444,6 +2468,49 @@ The fully expanded example above (without environment variables) looks like this
-F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' \
"https://demo.dataverse.org/api/files/:persistentId/replace?persistentId=doi:10.5072/FK2/AAA000"
Deleting Files
~~~~~~~~~~~~~~
Delete an existing file where ``ID`` is the database id of the file to delete or ``PERSISTENT_ID`` is the persistent id (DOI or Handle, if it exists) of the file.
Note that the behavior of deleting files depends on if the dataset has ever been published or not.
- If the dataset has never been published, the file will be deleted forever.
- If the dataset has published, the file is deleted from the draft (and future published versions).
- If the dataset has published, the deleted file can still be downloaded because it was part of a published version.
A curl example using an ``ID``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24
curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE $SERVER_URL/api/files/$ID
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/files/24
A curl example using a ``PERSISTENT_ID``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE "$SERVER_URL/api/files/:persistentId?persistentId=$PERSISTENT_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/files/:persistentId?persistentId=doi:10.5072/FK2/AAA000"
Getting File Metadata
~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -3260,7 +3327,7 @@ Each user can get a dump of their basic information in JSON format by passing in

curl -H "X-Dataverse-key:$API_TOKEN" $SERVER_URL/api/users/:me

.. _pids-api:


Managing Harvesting Server and Sets
-----------------------------------
Expand Down Expand Up @@ -3371,6 +3438,7 @@ The fully expanded example above (without the environment variables) looks like
Only users with superuser permissions may delete harvesting sets.


.. _managing-harvesting-clients-api:

Managing Harvesting Clients
Expand Down Expand Up @@ -3519,6 +3587,9 @@ Self-explanatory:
Only users with superuser permissions may delete harvesting clients.
.. _pids-api:
PIDs
----
Expand Down
Loading

0 comments on commit c982ec7

Please sign in to comment.