Tags · huggingface/dataset-viewer

0.21.0

Version 0.21.0

It has been a long time since the last version. Next tags and release
will follow a shorter cycle.

Feb 14, 2023
ded9a8c
zip
tar.gz
Notes

0.20.2

188 upgrade datasets (#209)

* feat: 🎸 upgrade datasets to 2.1.0

* test: 💍 remove test because the dataset does not exist anymore

Apr 14, 2022
751053e
zip
tar.gz
Notes

0.20.1

fix: 🐛 allow streaming=False in get_rows (#207)

it fixes #206.

Apr 12, 2022
4f940cb
zip
tar.gz
Notes

0.20.0

Simplify cache by dropping two collections (#202)

* docs: ✏️ add backup/restore to migration instructions

* feat: 🎸 pass the max number of rows to the worker

* feat: 🎸 delete the 'rows' and 'columns' collections

instead of keeping a large collection of rows and columns, then compute
the response on every endpoint call, possibly truncating the response,
we now pre-compute the response and store it in the cache. We lose the
ability to get the original data, but we don't need it. It fixes #197.
See
#197 (comment).

BREAKING CHANGE: 🧨 the cache database structure has been modified. Run
20220408_cache_remove_dbrow_dbcolumn.py to migrate the database.

* style: 💄 fix types and style

* docs: ✏️ add parameter to avoid error in mongodump

* docs: ✏️ mark ROWS_MAX_BYTES and ROWS_MIN_NUMBER as worker vars

Apr 12, 2022
623606d
zip
tar.gz
Notes

0.19.1

give reason in error if dataset/split cache is refreshing (#193)

* feat: 🎸 give reason in error if dataset/split cache is refreshi

fixes #186

* style: 💄 fix style

Apr 4, 2022
4a9bf7a
zip
tar.gz
Notes

0.19.0

remove "gated datasets unlock" logic (#189)

* refactor: 💡 move gated datasets "unlock" code to models/

also: add two tests to ensure the gated datasets can be accessed

* test: 💍 adapt to new version of dummy_gated dataset

I changed
(https://huggingface.co/datasets/severo/dummy_gated/commit/99194748bed3625a941aaf785740df02ca5762c9)
severo/dummy_gated to a simpler dataset, without a python script, to
avoid having non-related errors. Also in the commit: load the HF_TOKEN
from a secret in
https://github.com/huggingface/datasets-preview-backend/settings/secrets/actions
to be able to run the unit tests

* test: 💍 fix wrong hardcoded value

* chore: 🤖 ignore safety warning on ujson package

it's a dependency of lm-dataformat, and last version still depends on a
vulnerable ujson version

* feat: 🎸 remove the "ask_access" logic for gated datasets

the new "app" tokens on moonlanding can read the gated datasets without
having to accept the conditions first, as it occurs for users.

BREAKING CHANGE: 🧨 HF_TOKEN must be an app token

Apr 1, 2022
1a6eb0c
zip
tar.gz
Notes

0.18.3

Update blocked datasets (#187)

* feat: 🎸 block two more datasets

* style: 💄 sort the datasets to make it easier to maintain

Mar 25, 2022
de2ff07
zip
tar.gz
Notes

0.18.2

feat: 🎸 upgrade to datasets 2.0.0 (#182)

fixes #181

Mar 16, 2022
6f1b609
zip
tar.gz
Notes

0.18.1

feat: 🎸 revert double limit on the rows size (reverts #162) (#179)

Mar 14, 2022
155843f
zip
tar.gz
Notes

0.18.0

feat: 🎸 truncate cell contents instead of removing rows (#178)

Add a ROWS_MIN_NUMBER environment variable, which defines how many rows
should be returned as a minimum. If the size of these rows is greater
than the ROWS_MAX_BYTES limit, then the cells themselves are truncated
(transformed to strings, then truncated to 100 bytes which is an
hardcoded limit). In that case, the new field "truncated_cells" contain
the list of cells (column names) that are truncated.

BREAKING CHANGE: 🧨 The /rows response format has changed

Mar 14, 2022
f406c0d
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.21.0

0.20.2

0.20.1

0.20.0

0.19.1

0.19.0

0.18.3

0.18.2

0.18.1

0.18.0

Tags: huggingface/dataset-viewer