Skip to content

Tags: huggingface/dataset-viewer

Tags

0.21.0

Toggle 0.21.0's commit message
Version 0.21.0

It has been a long time since the last version. Next tags and release
will follow a shorter cycle.

0.20.2

Toggle 0.20.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
188 upgrade datasets (#209)

* feat: 🎸 upgrade datasets to 2.1.0

* test: 💍 remove test because the dataset does not exist anymore

0.20.1

Toggle 0.20.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
fix: 🐛 allow streaming=False in get_rows (#207)

it fixes #206.

0.20.0

Toggle 0.20.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Simplify cache by dropping two collections (#202)

* docs: ✏️ add backup/restore to migration instructions

* feat: 🎸 pass the max number of rows to the worker

* feat: 🎸 delete the 'rows' and 'columns' collections

instead of keeping a large collection of rows and columns, then compute
the response on every endpoint call, possibly truncating the response,
we now pre-compute the response and store it in the cache. We lose the
ability to get the original data, but we don't need it. It fixes #197.
See
#197 (comment).

BREAKING CHANGE: 🧨 the cache database structure has been modified. Run
20220408_cache_remove_dbrow_dbcolumn.py to migrate the database.

* style: 💄 fix types and style

* docs: ✏️ add parameter to avoid error in mongodump

* docs: ✏️ mark ROWS_MAX_BYTES and ROWS_MIN_NUMBER as worker vars

0.19.1

Toggle 0.19.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
give reason in error if dataset/split cache is refreshing (#193)

* feat: 🎸 give reason in error if dataset/split cache is refreshi

fixes #186

* style: 💄 fix style

0.19.0

Toggle 0.19.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
remove "gated datasets unlock" logic (#189)

* refactor: 💡 move gated datasets "unlock" code to models/

also: add two tests to ensure the gated datasets can be accessed

* test: 💍 adapt to new version of dummy_gated dataset

I changed
(https://huggingface.co/datasets/severo/dummy_gated/commit/99194748bed3625a941aaf785740df02ca5762c9)
severo/dummy_gated to a simpler dataset, without a python script, to
avoid having non-related errors. Also in the commit: load the HF_TOKEN
from a secret in
https://github.com/huggingface/datasets-preview-backend/settings/secrets/actions
to be able to run the unit tests

* test: 💍 fix wrong hardcoded value

* chore: 🤖 ignore safety warning on ujson package

it's a dependency of lm-dataformat, and last version still depends on a
vulnerable ujson version

* feat: 🎸 remove the "ask_access" logic for gated datasets

the new "app" tokens on moonlanding can read the gated datasets without
having to accept the conditions first, as it occurs for users.

BREAKING CHANGE: 🧨 HF_TOKEN must be an app token

0.18.3

Toggle 0.18.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Update blocked datasets (#187)

* feat: 🎸 block two more datasets

* style: 💄 sort the datasets to make it easier to maintain

0.18.2

Toggle 0.18.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
feat: 🎸 upgrade to datasets 2.0.0 (#182)

fixes #181

0.18.1

Toggle 0.18.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
feat: 🎸 revert double limit on the rows size (reverts #162) (#179)

0.18.0

Toggle 0.18.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
feat: 🎸 truncate cell contents instead of removing rows (#178)

Add a ROWS_MIN_NUMBER environment variable, which defines how many rows
should be returned as a minimum. If the size of these rows is greater
than the ROWS_MAX_BYTES limit, then the cells themselves are truncated
(transformed to strings, then truncated to 100 bytes which is an
hardcoded limit). In that case, the new field "truncated_cells" contain
the list of cells (column names) that are truncated.

BREAKING CHANGE: 🧨 The /rows response format has changed