Merge remote-tracking branch 'IQSS/develop' into IQSS/9185-contact_em…

…ail_updates
Jorgwfed · Apr 24, 2023 · c982ec7 · c982ec7
2 parents eb0a446 + e4961fe
commit c982ec7
Show file tree

Hide file tree

Showing 89 changed files with 2,437 additions and 961 deletions.
diff --git a/doc/release-notes/3913-delete-file-endpoint b/doc/release-notes/3913-delete-file-endpoint
@@ -0,0 +1 @@
+Support for deleting files using native API: http://preview.guides.gdcc.io/en/develop/api/native-api.html#deleting-files
diff --git a/doc/release-notes/5.13-release-notes.md b/doc/release-notes/5.13-release-notes.md
@@ -80,7 +80,7 @@ See [Metadata Blocks](https://guides.dataverse.org/en/5.13/api/native-api.html#m
 
 ### Advanced Database Settings
 
-You can now enable advanced database connection pool configurations useful for debugging and monitoring as well as other settings. Of particular interest may be `sslmode=require`. See the new [Database Persistence](https://guides.dataverse.org/en/5.13/installation/config.html#database-persistence) section of the Installation Guide for details. (PR #8915)
+You can now enable advanced database connection pool configurations useful for debugging and monitoring as well as other settings. Of particular interest may be `sslmode=require`, though installations already setting this parameter in the Postgres connection string will need to move it to `dataverse.db.parameters`. See the new [Database Persistence](https://guides.dataverse.org/en/5.13/installation/config.html#database-persistence) section of the Installation Guide for details. (PR #8915)
 
 ### Support for Cleaning up Leftover Files in Dataset Storage
 

diff --git a/doc/release-notes/7000-pidproviders.md b/doc/release-notes/7000-pidproviders.md
@@ -0,0 +1,30 @@
+# Changes to PID Provider JVM Settings
+
+In prepration for a future feature to use multiple PID providers at the same time, all JVM settings for PID providers
+have been enabled to be configured using MicroProfile Config. In the same go, they were renamed to match the name
+of the provider to be configured.
+
+Please watch your log files for deprecation warnings. Your old settings will be picked up, but you should migrate
+to the new names to avoid unnecessary log clutter and get prepared for more future changes. An example message
+looks like this:
+
+```
+[#|2023-03-31T16:55:27.992+0000|WARNING|Payara 5.2022.5|edu.harvard.iq.dataverse.settings.source.AliasConfigSource|_ThreadID=30;_ThreadName=RunLevelControllerThread-1680281704925;_TimeMillis=1680281727992;_LevelValue=900;|
+   Detected deprecated config option doi.username in use. Please update your config to use dataverse.pid.datacite.username.|#]
+```
+
+Here is a list of the new settings:
+
+- dataverse.pid.datacite.mds-api-url
+- dataverse.pid.datacite.rest-api-url
+- dataverse.pid.datacite.username
+- dataverse.pid.datacite.password
+- dataverse.pid.handlenet.key.path
+- dataverse.pid.handlenet.key.passphrase
+- dataverse.pid.handlenet.index
+- dataverse.pid.permalink.base-url
+- dataverse.pid.ezid.api-url
+- dataverse.pid.ezid.username
+- dataverse.pid.ezid.password
+
+See also http://preview.guides.gdcc.io/en/develop/installation/config.html#persistent-identifiers-and-publishing-datasets
diff --git a/doc/release-notes/8424-signposting.md b/doc/release-notes/8424-signposting.md
@@ -0,0 +1,8 @@
+# Signposting for Dataverse
+
+This release adds [Signposting](https://signposting.org/) support to Dataverse to improve machine discoverability of datasets and files.
+
+The following MicroProfile Config options are now available (these can be treated as JVM options):
+
+- dataverse.signposting.level1-author-limit
+- dataverse.signposting.level1-item-limit
diff --git a/doc/release-notes/9063-session-api-auth.md b/doc/release-notes/9063-session-api-auth.md
@@ -0,0 +1 @@
+A feature flag called "api-session-auth" has been added temporarily to aid in the development of the new frontend (#9063) but will be removed once bearer tokens (#9229) have been implemented. There is a security risk (CSRF) in enabling this flag! Do not use it in production! For more information, see http://preview.guides.gdcc.io/en/develop/installation/config.html#feature-flags
diff --git a/doc/release-notes/9150-improved-external-vocab-supprt.md b/doc/release-notes/9150-improved-external-vocab-supprt.md
@@ -0,0 +1 @@
+It is now possible to write external vocabulary scripts that target a single child field in a metadata block. Example scripts are now available at https://github.com/gdcc/dataverse-external-vocab-support that can be configured to support lookup from the Research Orgnaization Registry (ROR) for the Author Affiliation Field and for the CrossRef Funding Registry (Fundreg) in the Funding Information/Agency field, both in the standard Citation metadata block. Application if these scripts to other fields, and the development of other scripts targetting child fields are now possible.
diff --git a/doc/release-notes/9277-nonstopmode-pdf-guides.md b/doc/release-notes/9277-nonstopmode-pdf-guides.md
@@ -0,0 +1,3 @@
+An experimental version of the guides in PDF format is available at <http://preview.guides.gdcc.io/_/downloads/en/develop/pdf/>
+
+Advice for contributors to documentation who want to help improve the PDF is available at http://preview.guides.gdcc.io/en/develop/developers/documentation.html#pdf-version-of-the-guides
diff --git a/doc/release-notes/9374-binder-orig.md b/doc/release-notes/9374-binder-orig.md
@@ -0,0 +1,7 @@
+Files downloaded from Binder are now in their original format.
+
+For example, data.dta (a Stata file) will be downloaded instead of data.tab (the archival version Dataverse creates as part of a successful ingest).
+
+This should make it easier to write code to reproduce results as the dataset authors and subsequent researchers are likely operating on the original file format rather that the format that Dataverse creates.
+
+For details, see #9374, <https://github.com/jupyterhub/repo2docker/issues/1242>, and <https://github.com/jupyterhub/repo2docker/pull/1253>.
diff --git a/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv b/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv
@@ -1,6 +1,6 @@
 Tool	Type	Scope	Description
 Data Explorer	explore	file	A GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. See the README.md file at https://github.com/scholarsportal/dataverse-data-explorer-v2 for the instructions on adding Data Explorer to your Dataverse. 
 Whole Tale	explore	dataset	A platform for the creation of reproducible research packages that allows users to launch containerized interactive analysis environments based on popular tools such as Jupyter and RStudio. Using this integration, Dataverse users can launch Jupyter and RStudio environments to analyze published datasets. For more information, see the `Whole Tale User Guide <https://wholetale.readthedocs.io/en/stable/users_guide/integration.html>`_.
-Binder	explore	dataset	Binder allows you to spin up custom computing environments in the cloud (including Jupyter notebooks) with the files from your dataset. `Installation instructions <https://github.com/data-exp-lab/girder_ythub/issues/10>`_ are in the Data Exploration Lab girder_ythub project.
+Binder	explore	dataset	Binder allows you to spin up custom computing environments in the cloud (including Jupyter notebooks) with the files from your dataset. `Installation instructions <https://github.com/data-exp-lab/girder_ythub/issues/10>`_ are in the Data Exploration Lab girder_ythub project. See also :ref:`binder`.
 File Previewers	explore	file	A set of tools that display the content of files - including audio, html, `Hypothes.is <https://hypothes.is/>`_ annotations, images, PDF, text, video, tabular data, spreadsheets, GeoJSON, zip, and NcML files - allowing them to be viewed without downloading the file. The previewers can be run directly from github.io, so the only required step is using the Dataverse API to register the ones you want to use. Documentation, including how to optionally brand the previewers, and an invitation to contribute through github are in the README.md file. Initial development was led by the Qualitative Data Repository and the spreasdheet previewer was added by the Social Sciences and Humanities Open Cloud (SSHOC) project. https://github.com/gdcc/dataverse-previewers
 Data Curation Tool	configure	file	A GUI for curating data by adding labels, groups, weights and other details to assist with informed reuse. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Curation-Tool for the installation instructions.
diff --git a/doc/sphinx-guides/source/admin/discoverability.rst b/doc/sphinx-guides/source/admin/discoverability.rst
@@ -0,0 +1,76 @@
+Discoverability
+===============
+
+Datasets are made discoverable by a variety of methods.
+
+.. contents:: |toctitle|
+  :local:
+
+DataCite Integration
+--------------------
+
+If you are using `DataCite <https://datacite.org>`_ as your DOI provider, when datasets are published, metadata is pushed to DataCite, where it can be searched. For more information, see :ref:`:DoiProvider` in the Installation Guide.
+
+OAI-PMH (Harvesting)
+--------------------
+
+The Dataverse software supports a protocol called OAI-PMH that facilitates harvesting dataset metadata from one system into another. For details on harvesting, see the :doc:`harvestserver` section.
+
+Machine-Readable Metadata on Dataset Landing Pages
+--------------------------------------------------
+
+As recommended in `A Data Citation Roadmap for Scholarly Data Repositories <https://doi.org/10.1101/097196>`_, the Dataverse software embeds metadata on dataset landing pages in a variety of machine-readable ways. 
+
+Dublin Core HTML Meta Tags
+++++++++++++++++++++++++++
+
+The HTML source of a dataset landing page includes "DC" (Dublin Core) ``<meta>`` tags such as the following::
+
+        <meta name="DC.identifier" content="..."
+        <meta name="DC.type" content="Dataset"
+        <meta name="DC.title" content="..."
+
+Schema.org JSON-LD Metadata
++++++++++++++++++++++++++++
+
+The HTML source of a dataset landing page includes Schema.org JSON-LD metadata like this::
+
+
+        <script type="application/ld+json">{"@context":"http://schema.org","@type":"Dataset","@id":"https://doi.org/...
+
+
+.. _discovery-sign-posting:
+
+Signposting
++++++++++++
+
+The Dataverse software supports `Signposting <https://signposting.org>`_. This allows machines to request more information about a dataset through the `Link <https://tools.ietf.org/html/rfc5988>`_ HTTP header.
+
+There are 2 Signposting profile levels, level 1 and level 2. In this implementation, 
+ * Level 1 links are shown `as recommended <https://signposting.org/FAIR/>`_ in the "Link"
+   HTTP header, which can be fetched by sending an HTTP HEAD request, e.g. ``curl -I https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.5072/FK2/KPY4ZC``.
+   The number of author and file links in the level 1 header can be configured as described below. 
+ * The level 2 linkset can be fetched by visiting the dedicated linkset page for 
+   that artifact. The link can be seen in level 1 links with key name ``rel="linkset"``.
+
+Note: Authors without author link will not be counted nor shown in any profile/linkset. 
+The following configuration options are available:
+
+- :ref:`dataverse.signposting.level1-author-limit`
+
+  Sets the max number of authors to be shown in `level 1` profile.
+  If the number of authors (with identifier URLs) exceeds this value, no author links will be shown in `level 1` profile.
+  The default is 5.
+
+- :ref:`dataverse.signposting.level1-item-limit`
+
+  Sets the max number of items/files which will be shown in `level 1` profile. Datasets with
+  too many files will not show any file links in `level 1` profile. They will be shown in `level 2` linkset only. 
+  The default is 5.
+
+See also :ref:`signposting-api` in the API Guide.
+
+Additional Discoverability Through Integrations
+-----------------------------------------------
+
+See :ref:`integrations-discovery` in the Integrations section for additional discovery methods you can enable.
diff --git a/doc/sphinx-guides/source/admin/index.rst b/doc/sphinx-guides/source/admin/index.rst
@@ -14,6 +14,7 @@ This guide documents the functionality only available to superusers (such as "da
 
    dashboard
    external-tools
+   discoverability
    harvestclients
    harvestserver
    metadatacustomization

diff --git a/doc/sphinx-guides/source/admin/integrations.rst b/doc/sphinx-guides/source/admin/integrations.rst
@@ -147,21 +147,27 @@ Compute Button
 
 The "Compute" button is still highly experimental and has special requirements such as use of a Swift object store, but it is documented under "Setting up Compute" in the :doc:`/installation/config` section of the Installation Guide.
 
+.. _wholetale:
+
 Whole Tale
 ++++++++++
 
 `Whole Tale <https://wholetale.org>`_  enables researchers to analyze data using popular tools including Jupyter and RStudio with the ultimate goal of supporting publishing of reproducible research packages. Users can
 `import data from a Dataverse installation
 <https://wholetale.readthedocs.io/en/stable/users_guide/manage.html>`_ via identifier (e.g., DOI, URI, etc) or through the External Tools integration.  For installation instructions, see the :doc:`external-tools` section or the `Integration <https://wholetale.readthedocs.io/en/stable/users_guide/integration.html#dataverse-external-tools>`_ section of the Whole Tale User Guide.
 
+.. _binder:
+
 Binder
 ++++++
 
-Researchers can launch Jupyter Notebooks, RStudio, and other computational environments by entering the DOI of a dataset in a Dataverse installation on https://mybinder.org
+Researchers can launch Jupyter Notebooks, RStudio, and other computational environments by entering the DOI of a dataset in a Dataverse installation at https://mybinder.org
+
+A Binder button can also be added to every dataset page to launch Binder from there. Instructions on enabling this feature can be found under :doc:`external-tools`.
 
-A Binder button can also be added to every dataset page to launch Binder from there. See :doc:`external-tools`.
+Additionally, institutions can self host `BinderHub <https://binderhub.readthedocs.io/en/latest/>`_ (the software that powers mybinder.org), which lists the Dataverse software as one of the supported `repository providers <https://binderhub.readthedocs.io/en/latest/developer/repoproviders.html#supported-repoproviders>`_.
 
-Institutions can self host BinderHub. The Dataverse Project is one of the supported `repository providers <https://binderhub.readthedocs.io/en/latest/developer/repoproviders.html#supported-repoproviders>`_.
+.. _renku:
 
 Renku
 +++++
@@ -179,15 +185,12 @@ Avgidea Data Search
 
 Researchers can use a Google Sheets add-on to search for Dataverse installation's CSV data and then import that data into a sheet. See `Avgidea Data Search <https://www.avgidea.io/avgidea-data-platform.html>`_ for details.
 
+.. _integrations-discovery:
+
 Discoverability
 ---------------
 
-Integration with `DataCite <https://datacite.org>`_ is built in to the Dataverse Software. When datasets are published, metadata is sent to DataCite. You can further increase the discoverability of your datasets by setting up additional integrations.
-
-OAI-PMH (Harvesting)
-++++++++++++++++++++
-
-The Dataverse Software supports a protocol called OAI-PMH that facilitates harvesting datasets from one system into another. For details on harvesting, see the :doc:`harvestserver` section.
+A number of builtin features related to data discovery are listed under :doc:`discoverability` but you can further increase the discoverability of your data by setting up integrations.
 
 SHARE
 +++++

diff --git a/doc/sphinx-guides/source/admin/make-data-count.rst b/doc/sphinx-guides/source/admin/make-data-count.rst
@@ -146,7 +146,9 @@ Configuring Your Dataverse Installation for Make Data Count Citations
 
 Please note: as explained in the note above about limitations, this feature is not available to Dataverse installations that use Handles.
 
-To configure your Dataverse installation to pull citations from the test vs. production DataCite server see :ref:`doi.dataciterestapiurlstring` in the Installation Guide.
+To configure your Dataverse installation to pull citations from the test vs.
+production DataCite server see :ref:`dataverse.pid.datacite.rest-api-url` in
+the Installation Guide.
 
 Please note that in the curl example, Bash environment variables are used with the idea that you can set a few environment variables and copy and paste the examples as is. For example, "$DOI" could become "doi:10.5072/FK2/BL2IBM" by issuing the following export command from Bash:
 

diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst
@@ -2084,10 +2084,34 @@ The response is a JSON object described in the :doc:`/api/external-tools` sectio
   export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
   export VERSION=1.0
   export TOOL_ID=1
-  
 
   curl -H "X-Dataverse-key: $API_TOKEN" -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/toolparams/$TOOL_ID?persistentId=$PERSISTENT_IDENTIFIER"
 
+.. _signposting-api:
+
+Retrieve Signposting Information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Dataverse supports :ref:`discovery-sign-posting` as a discovery mechanism.
+Signposting involves the addition of a `Link <https://tools.ietf.org/html/rfc5988>`__ HTTP header providing summary information on GET and HEAD requests to retrieve the dataset page and a separate /linkset API call to retrieve additional information.
+
+Here is an example of a "Link" header:
+
+``Link: <https://doi.org/10.5072/FK2/YD5QDG>;rel="cite-as", <https://doi.org/10.5072/FK2/YD5QDG>;rel="describedby";type="application/vnd.citationstyles.csl+json",<https://demo.dataverse.org/api/datasets/export?exporter=schema.org&persistentId=doi:10.5072/FK2/YD5QDG>;rel="describedby";type="application/json+ld", <https://schema.org/AboutPage>;rel="type",<https://schema.org/Dataset>;rel="type", https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.5072/FK2/YD5QDG;rel="license", <https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/linkset?persistentId=doi:10.5072/FK2/YD5QDG> ; rel="linkset";type="application/linkset+json"``
+
+The URL for linkset information is discoverable under the ``rel="linkset";type="application/linkset+json`` entry in the "Link" header, such as in the example above.
+
+The reponse includes a JSON object conforming to the `Signposting <https://signposting.org>`__ specification.
+Signposting is not supported for draft dataset versions.
+
+.. code-block:: bash
+
+  export SERVER_URL=https://demo.dataverse.org
+  export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
+  export VERSION=1.0
+
+  curl -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/linkset?persistentId=$PERSISTENT_IDENTIFIER"
+
 Files
 -----
 
@@ -2444,6 +2468,49 @@ The fully expanded example above (without environment variables) looks like this
     -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' \
     "https://demo.dataverse.org/api/files/:persistentId/replace?persistentId=doi:10.5072/FK2/AAA000"
 
+Deleting Files
+~~~~~~~~~~~~~~
+
+Delete an existing file where ``ID`` is the database id of the file to delete or ``PERSISTENT_ID`` is the persistent id (DOI or Handle, if it exists) of the file.
+
+Note that the behavior of deleting files depends on if the dataset has ever been published or not.
+
+- If the dataset has never been published, the file will be deleted forever.
+- If the dataset has published, the file is deleted from the draft (and future published versions).
+- If the dataset has published, the deleted file can still be downloaded because it was part of a published version.
+
+A curl example using an ``ID``
+
+.. code-block:: bash
+
+  export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+  export SERVER_URL=https://demo.dataverse.org
+  export ID=24
+
+  curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE $SERVER_URL/api/files/$ID
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+  curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/files/24
+
+A curl example using a ``PERSISTENT_ID``
+
+.. code-block:: bash
+
+  export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+  export SERVER_URL=https://demo.dataverse.org
+  export PERSISTENT_ID=doi:10.5072/FK2/AAA000
+
+  curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE "$SERVER_URL/api/files/:persistentId?persistentId=$PERSISTENT_ID"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+  curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/files/:persistentId?persistentId=doi:10.5072/FK2/AAA000"
+
 Getting File Metadata
 ~~~~~~~~~~~~~~~~~~~~~
 
@@ -3260,7 +3327,7 @@ Each user can get a dump of their basic information in JSON format by passing in
 
     curl -H "X-Dataverse-key:$API_TOKEN" $SERVER_URL/api/users/:me    
 
-.. _pids-api:
+
 
 Managing Harvesting Server and Sets
 -----------------------------------
@@ -3371,6 +3438,7 @@ The fully expanded example above (without the environment variables) looks like
 
 Only users with superuser permissions may delete harvesting sets.
 
+
 .. _managing-harvesting-clients-api:
 
 Managing Harvesting Clients
@@ -3519,6 +3587,9 @@ Self-explanatory:
 Only users with superuser permissions may delete harvesting clients.
 
 
+
+.. _pids-api:
+
 PIDs
 ----
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		Support for deleting files using native API: http://preview.guides.gdcc.io/en/develop/api/native-api.html#deleting-files
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		A feature flag called "api-session-auth" has been added temporarily to aid in the development of the new frontend (#9063) but will be removed once bearer tokens (#9229) have been implemented. There is a security risk (CSRF) in enabling this flag! Do not use it in production! For more information, see http://preview.guides.gdcc.io/en/develop/installation/config.html#feature-flags
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		It is now possible to write external vocabulary scripts that target a single child field in a metadata block. Example scripts are now available at https://github.com/gdcc/dataverse-external-vocab-support that can be configured to support lookup from the Research Orgnaization Registry (ROR) for the Author Affiliation Field and for the CrossRef Funding Registry (Fundreg) in the Funding Information/Agency field, both in the standard Citation metadata block. Application if these scripts to other fields, and the development of other scripts targetting child fields are now possible.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		An experimental version of the guides in PDF format is available at <http://preview.guides.gdcc.io/_/downloads/en/develop/pdf/>

		Advice for contributors to documentation who want to help improve the PDF is available at http://preview.guides.gdcc.io/en/develop/developers/documentation.html#pdf-version-of-the-guides