Releases: gdcc/pyDataverse
v0.3.4
pyDataverse v0.3.4
We are excited to announce the release of pyDataverse version 0.3.4! This update brings important fixes and new features, aimed at enhancing functionality and security for users.
Key Updates:
-
Bug Fix: We have addressed an issue related to the
replace_datafile
functionality, where metadata passed was not being updated correctly. This bug was initially reported in the datalad-dataverse GitHub repository. The issue has now been resolved, ensuring smoother file replacements with accurate metadata updates. -
New Authentication Feature: A special thanks to @shoeffner for contributing a base implementation of the
httpx
-based authentication. This update transitions the existing API-Token authentication flow into a more secure variant. It strengthens the overall authentication process, making it safer for users. -
Future Authentication Support: In addition to improving current security, the new authentication framework opens the door to future enhancements. pyDataverse is now well-positioned to support other authentication protocols such as OpenID Connect (OIDC) and SAML, which will be introduced in upcoming releases.
In the following, find a complete list of merged pull requests:
What's Changed
- fix calendar link in README by @pdurbin in #190
- Fix handling of empty metadata when uploading data files by @shoeffner in #207
- Run pre-commit run --all by @shoeffner in #196
- Update Docs Occurrences of requests.Response by @shoeffner in #199
- Update tox.ini and pyproject.toml by @shoeffner in #205
- Update DV_VERSION to 6.3 by @shoeffner in #197
- doc(
replace_datafile
): clarify and correct parameter description by @mih in #202 - Update contrib guide by @shoeffner in #206
- Fix
jsonData
not passed correctly by @JR-1991 in #203 - Rework auth/api_token parameters by @shoeffner in #201
- Remove trailing slash of
base_url
uponApi
initialization by @JR-1991 in #214 - Fix
get_datasets_by_data_location
docstring displaying wrongGET
url by @JR-1991 in #212
New Contributors
- @shoeffner made their first contribution in #207
- @mih made their first contribution in #202
Full Changelog: v0.3.3...v0.3.4
v0.3.3
What's Changed
- Set
timeout
toNone
to avoid timeout errors by @JR-1991 in #188 - Added documentation of pyDataverse conda installation (again) by @PennyHow in #184
- in README, link to https://py.gdcc.io by @pdurbin in #183
- Update
LICENSE.txt
by @JR-1991 in #185 - typo by @pdurbin in #186
New Contributors
Full Changelog: v0.3.2...v0.3.3
v0.3.2
We are excited to announce the release of the latest patch version of pyDataverse after a significant period of inactivity. This update brings a range of new functionalities and bug fixes, aimed at improving the stability and performance of the pyDataverse library.
The library has been equipped with a CI/CD pipeline to ensure consistent integration with Dataverse. To achieve this, we have utilized the Dataverse Action which uses the progress made by the Dataverse Containerization Working Group to create local instances of Dataverse with ease. This has made contributions safer and made it easier to test pull requests.
PyDataverse has also switched from the requests
library to HTTPX, a powerful library for performing HTTP requests. The library offers better performance and compatibility and allows async requests, which were previously impossible. For more information on how to use the new async, please refer to PR #175 for now.
Finally, pyDataverse's building and dependency management has been transferred to pyproject.toml
from setup.py
. The current de facto standard in packaging Python projects offers numerous advantages over setup.py
while maintaining compatibility with the pip
installer.
What's Changed
- Add Zulip channel to README.md by @JR-1991 in #165
- Add CI/CD pipeline and re-establish existing tests by @JR-1991 in #167
- Change repo status to
active
by @JR-1991 in #168 - Requests via
httpx
by @JR-1991 in #174 - Add codespell support (config, workflow) and make it fix some typos by @yarikoptic in #179
- Bump black from 19.10b0 to 24.3.0 in /requirements by @dependabot in #177
- Fix data access and redirects by @JR-1991 in #182
- Provide local testing functionality by @JR-1991 in #172
- Async requests by @JR-1991 in #175
- Switch to
pyproject.toml
andpoetry
by @JR-1991 in #180
Fixes
- problem with replace_datafile by @albenard in #171
- Posting JSON broken on Dataverse 5.9 by @Jeija in #143
New Contributors
- @JR-1991 made their first contribution in #165
- @yarikoptic made their first contribution in #179
Full Changelog: 0.3.1...v0.3.2
Chat with us!
If you are interested in the development of pyDataverse, we invite you to join us for a chat on our Zulip Channel. This is the perfect place to discuss and exchange ideas about the development of pyDataverse. Whether you need help or have ideas to share, feel free to join us!
PyDataverse Working Group
We have formed a pyDataverse working group to exchange ideas and collaborate on pyDataverse. There is a bi-weekly meeting planned for this purpose, and you are welcome to join us by clicking the following WebEx meeting link. For a list of all the scheduled dates, please refer to the Dataverse Community calendar.
0.3.1
Small bugfix of #126.
For help or general questions please have a look in our Docs or email [email protected].
Bugs
- Fix: missing topicClassVocabURI value in Dataset model (#126)
Thanks
Thanks to Karin Faktor for finding the bug.
PyDataverse is supported by AUSSDA and by funding as part of the Horizon2020 project SSHOC.
v0.3.0 - Ruth Wodak
This release is a big change in many parts of the package. It adds new API's, re-factored models and lots of new documentation.
Overview of the most important changes:
- Re-factored data models: setters, getters, data validation and JSON export and import
- Export and import of metadata to/from pre-formatted CSV templates
- Add User Guides, Use-Cases, Contributor Guide and much more to the documentation
- Add SWORD, Search, Metrics and Data Access API
- Collect the complete data tree of a Dataverse with
get_children()
- Use JSON schemas for metadata validation (
jsonschemas
required) - Updated Python requirements: Python>=3.6 (no Python 2 support anymore)
- Curl required, only for
update_datafile()
- Transfer pyDataverse to GDCC - the Global Dataverse Community Consortium (#52)
Version 0.3.0 is named in honor of Ruth Wodak (Wikipedia), an Austrian linguist. Her work is mainly located in discourse studies, more specific in critical discourse analysis, which looks at discourse as a form of social practice. She was awarded with the Wittgenstein-Preis, the highest Austrian science award.
For help or general questions please have a look in our Docs or email [email protected].
Use-Cases
The new functionalities were developed with some specific use-cases in mind:
See more detailed in our Documentation.
Retrieve data structure and metadata from Dataverse instance (DevOps)
Collect all Dataverses, Datasets and Datafiles of a Dataverse instance, or just a part of it. The results then can be stored in JSON files, which can be used for testing purposes, like checking the completeness of data after a Dataverse upgrade or migration.
Upload and removal of test data (DevOps)
For testing, you often have to upload a collection of data and metadata, which should be removed after the test is finished. For this, we offer easy to use functionalities.
Import data from CSV templates (Data Scientist)
Importing lots of data from data sources outside dataverse can be done with the CSV templates as a bridge. Fill the CSV templates with your data, by machine or by human, and import them into pyDataverse for an easy mass upload via the Dataverse API.
Bugs
Features & Enhancements
API
Summary: Add other API's next to Native API and update Native API.
- add Data Access API:
- get datafile(s) (
get_datafile()
,get_datafiles()
,get_datafile_bundle()
) - request datafile access (
request_access()
,allow_access_request()
,grant_file_access()
,list_file_access_requests()
)
- get datafile(s) (
- add Metrics API:
total()
,past_days()
,get_dataverses_by_subject()
,get_dataverses_by_category()
,get_datasets_by_subject()
,get_datasets_by_data_location()
- add SWORD API:
get_service_document()
- add Search API:
search()
- Native API:
- Get all children data-types of a Dataverse or a Dataset in a tree structure (
get_children()
) - Convert Dataverse ID's to its alias (
dataverse_id2alias()
) - Get contents of a Dataverse (Datasets, Dataverses) (
get_dataverse_contents()
) - Get Dataverse assignements (
get_dataverse_assignments()
) - Get Dataverse facets (
get_dataverse_facets()
) - Edit Dataset metadata (
edit_dataset_metadata()
) (#19) - Destroy Dataset (
destroy_dataset()
) - Dataset private URL functionalities (
create_dataset_private_url()
,get_dataset_private_url()
,delete_dataset_private_url()
) - Get Dataset version(s) (
get_dataset_versions()
,get_dataset_version()
) - Get Dataset assignments (
get_dataset_assignments()
) - Check if Dataset is locked (
get_dataset_lock()
) - Get Datafiles metadata
get_datafiles_metadata()
- Update datafile metadata (
update_datafile_metadata()
) - Redetect Datafile file type (
redetect_file_type()
) - Restrict Datafile (
restrict_datafile()
) - ingest Datafiles (
reingest_datafile()
,uningest_datafile()
) - Datafile upload in native Python (no CURL dependency anymore) (
upload_datafile()
) - Replace existing Datafile
replace_datafile()
- Roles functionalities (
get_dataverse_roles()
,create_role()
,show_role()
,delete_role()
) - Add API token functionalities (
get_user_api_token_expiration_date()
,recreate_user_api_token()
,delete_user_api_token()
) - Get current user data (
get_user()
) (#59) - Get API ToU (
get_info_api_terms_of_use()
) - Add import of existing Dataset in
create_dataset()
(#3) - Datafile upload natively in Python (no curl anymore) (
upload_datafile()
)
- Get all children data-types of a Dataverse or a Dataset in a tree structure (
- Api
- Set User-Agent for requests to
pydataverse
- Change authentication during request functions (get, post, delete, put): If API token is passed, use it. If not, don't set it. No
auth
parameter used anymore.
- Set User-Agent for requests to
Models
Summary: Re-factoring of all models (Dataverse, Dataset, Datafile).
New methods:
from_json()
imports JSON (like Dataverse's own JSON format) to pyDataverse models objectget()
returns a dict of the pyDataverse models objectjson()
returns a JSON string (like Dataverse's own JSON format) of the pyDataverse models object. Mostly used for API uploads.validate_data()
validates a pyDataverse object with a JSON schema
Utils
- Save list of metadata (Dataverses, Datasets or Datafiles) to a CSV file (
write_dicts_as_csv()
) (#11) - Walk through the data tree from
get_children()
and extract Dataverses, Datasets and Datafiles (dataverse_tree_walker()
) - Store the results from
dataverse_tree_walker()
in seperate JSON files (save_tree_data()
) - Validate any data model dictionary (Dataverse, Dataset, Datafile) against a JSON schema (
validate_data()
) - Clean strings (trim whitespace) (
clean_string()
) - Create URL's from identifier (
create_dataverse_url()
,create_dataset_url()
,create_datafile_url()
) - Update
read_csv_to_dict()
: replacedv.
prefix, load JSON cells and convert boolean cell strings
Docs
Many new pages and tutorials:
- Add User Guide - Basic
- Add User Guide - Advanced
- Add User Guide - Use-Cases
- Add Contributor Guide
- Add Installation
- Add CSV templates
- Add FAQ
- Add Resources
- Improve docstrings
- Fix typo (#40)
- Update Homepage
Tests
- Add tests for new functions
- Re-factor existing tests
- Create fixtures
- Create test data
Miscellaneous
- Add Python 3.8 and Python 2.7, 3.4 and 3.5 removed (Python>=3.6 required now)
- Add jsonschema as requirement
- Add JSON schemas for Dataverse upload, Dataset upload, Datafile upload and DSpace to package
- Add CSV templates for Dataverses, Datasets and Datafiles from pyDataverse_templates
- Transfer pyDataverse to GDCC - the Global Dataverse Community Consortium (#52)
- Improve code formatting: black, isort, pylint, mypy, pre-commit
- Add pylint linter
- Add mypy type checker
- Add pre-commit for managing pre-commit hooks.
- Add radon code metrics
- Add GitHub templates (PR, issues, commit) (#57)
- Re-structure requirements
- Get DOI:10.5281/zenodo.4470151 for GitHub repository
Other
Thanks to Daniel Melichar (@dmelichar), Vyacheslav Tykhonov (Slava), GDCC, @ecowan, @BPeuch, @j-n-c and @ambhudia for their support for this release. Special thanks to the Pandas project for their great blueprint for the Contributor Guide.
PyDataverse is supported by funding as part of the Horizon2020 project SSHOC.
v0.2.1
This release fixes a bug in the Dataset.dict()
generation.
For help or general questions please have a look in our Docs or email [email protected].
Bug Fixes
- FIXED: calling of the attributes
series
,socialScienceNotes
andtargetSampleSize
caused error inDataset.dict()
, cause the contained sub-values were stored directly in own class-attributes.
Contribute
To find out how you can contribute, please have a look at the Contributor Guide. No contribution is too small!
The most important contribution you can make right now is to use the module. It would be great, if you install it, run some code on your PC and access your own Dataverse instance if possible - and give feedback after it (contact).
About pyDataverse
pyDataverse includes a collection of functionalities to import, export and manipulate data and it's metadata via the Dataverse API.
-- Greetz, Stefan Kasberger
v0.2.0 - Ida Pfeiffer
This release adds functionalities to import, manipulate and export the metadata of Dataverses, Datasets and Datafiles.
Version 0.2.0 is named in honor of Ida Pfeiffer (Wikipedia), an Austrian traveler and travel book author. She went on for several travels around the world, where she collected plants, insects, mollusks, marine life and mineral specimens and brought most of them back home to the Natural History Museum of Vienna.
For help or general questions please have a look in our Docs or email [email protected].
Features
- add Datavers Api metadata functionalities:
- set allowed attributes via a list of
dict()
- import of Dataverse and Dataset metadata from Dataverse Api JSON
- validity check of Dataverse, Dataset and Datafile attributes necessary for Dataverse Api upload
- export Dataverse, Dataset and Datafile attributes as dict() and JSON
- export Dataverse and Dataset metadata JSON necessary for Dataverse Api upload
- tests for Dataverse, Dataset and Datafile
- set allowed attributes via a list of
- add PUT request and edit metadata request to
Api()
(PR #8) - read in csv files and convert to Dataverse compatible
dict()
for automatic import of datasets into aDataset()
object
Improvements
- improved documentation: added docstrings where missing, cleaned them up and added examples
- added PyPI test to
tox.ini
- added test fixtures for frequently used functions inside tests
Dependencies
- fixed requests version:
requests>=2.12.0
or newer needed
Contribute
From 18th to 22nd of June 2019, pyDataverse's main developer Stefan Kasberger will be at the Dataverse Community Conference in Cambridge, MA to exchange with others about pyDataverse end develop it further. If you are interested and around, drop by and join us. If you can not attend, you can connect with us via Dataverse Chat.
To find out how you can contribute, please have a look at the Contributor Guide. No contribution is too small!
The most important contribution you can make right now is to use the module. It would be great, if you install it, run some code on your PC and access your own Dataverse instance if possible - and give feedback after it (contact).
Another way is, to share this release with others, who could be interested (e. g. retweet my Tweet, or send an Email).
About pyDataverse
pyDataverse includes a collection of functionalities to import, export and manipulate data and it's metadata via the Dataverse API.
https://twitter.com/stefankasberger/status/1140832352517668864
Thanks to Ajax23 for the PR #8. Great contribution, and it's always amazing to see the idea of Open Source in action. :)
-- Greetz, Stefan Kasberger
v0.1.1
This release is a quick bugfix. It adds requests to the install_requirements and updates the packaging and testing configuration.
For help or general questions please have a look in our Docs or email [email protected].
Bugfixes
- https://github.com/AUSSDA/pyDataverse/issues/7: fix pip install error: add
requests
to theinstall_requires
insetup.py
Improvements
- cleaned
setup.py
- add badges to index.rst
- cleaned
tools/tests-requirements.txt
tox.ini
: add python versions, add dist test, add pypitest test, clean up and re-structure configuration- update docs
Contribute
To find out how you can contribute, please have a look at the Contributor Guide. No contribution is too small!
The most important contribution right now is simply to use the module. It would be great, if you install it, run some code on your PC and access your own Dataverse instance if possible - and give feedback after it (contact).
About pyDataverse
pyDataverse includes the most basic data operations to import and export data via the Dataverse API. The functionality will be expanded in the next weeks with more requests and a class-based data model for the metadata. This will allow to easily import and export metadata, and upload it directly to the API.
Thanks to @moumenuisawe for mentioning this bug.
-- Greetz, Stefan Kasberger
v0.1.0 - Marietta Blau
This release is the initial, first one of pyDataverse. It offers basic features to access the Dataverse API via Python, to create, retrieve, publish and delete Dataverses, Datasets and Datafiles.
Version 0.1.0 is named in honor of Marietta Blau (Wikipedia), an Austrian researcher in the field of particle physics. In 1950, she was nominated for the Nobel prize for her contributions.
For help or general questions please have a look in our Docs or email [email protected].
Features
api.py
:- Make GET, POST and DELETE requests.
- Create, retrieve, publish and delete Dataverses via the API
- Create, retrieve, publish and delete Datasets via the API
- Upload and retrieve Datafiles via the API
- Retrieve server informations and metadata via the API
utils.py
: File IO and data conversion functionalities to support API operationsexceptions.py
: Custom exceptionstests/*.py
: Tests with test data in pytest, tested with tox on travis ci.- Documentation with Sphinx, published on Read the Docs
- Package on PyPI
- Open Source (MIT)
Contribute
To find out how you can contribute, please have a look at the Contributor Guide. No contribution is too small!
The most important contribution right now is simply to use the module. It would be great, if you install it, run some code on your PC and access your own Dataverse instance if possible - and give feedback after it (contact).
Another way is, to share this release with others, who could be interested (e. g. retweet my Tweet, or send an Email).
About pyDataverse
pyDataverse includes the most basic data operations to import and export data via the Dataverse API. The functionality will be expanded in the next weeks with more requests and a class-based data model for the metadata. This will allow to easily import and export metadata, and upload it directly to the API.
Thanks to dataverse-client-python, for being the main orientation and input for the start of pyDataverse. Also thanks to @kaczmirek, @pdurbin, @djbrooke and @4tikhonov for their support on this.
-- Greetz, Stefan Kasberger