Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to generate MARC-ja because of 403 Forbidden #10

Open
akeyhero opened this issue Jul 31, 2023 · 6 comments
Open

Unable to generate MARC-ja because of 403 Forbidden #10

akeyhero opened this issue Jul 31, 2023 · 6 comments

Comments

@akeyhero
Copy link

akeyhero commented Jul 31, 2023

Thank you for the great benchmark.

Amazon Reviews Corpus seems to be inaccessible.

$ wget https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_multilingual_JP_v1_00.tsv.gz
--2023-07-31 15:22:11--  https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_multilingual_JP_v1_00.tsv.gz
s3.amazonaws.com (s3.amazonaws.com) をDNSに問いあわせています... 52.216.98.53, 52.216.41.112, 52.216.249.70, ...
s3.amazonaws.com (s3.amazonaws.com)|52.216.98.53|:443 に接続しています... 接続しました。
HTTP による接続要求を送信しました、応答を待っています... 403 Forbidden
2023-07-31 15:22:11 エラー 403: Forbidden。

and with the command from https://registry.opendata.aws/amazon-reviews-ml/

$ aws s3 ls --no-sign-request s3://amazon-reviews-ml/

An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

We may be able to move to HuggingFace: https://huggingface.co/datasets/amazon_reviews_multi
(I can not validate that I can generate the same dataset as the original one.)
(also not available)

@kaisugi
Copy link

kaisugi commented Jul 31, 2023

I would suggest https://huggingface.co/datasets/shunk031/JGLUE, which includes exactly the same as the original MARC-ja

from datasets import load_dataset
dataset = load_dataset("shunk031/JGLUE", name="MARC-ja")

@shunk031
Copy link

Thank you for the suggestion. I'm the maintainer of the shunk031/JGLUE repository. Unfortunately, that code also uses https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_multilingual_JP_v1_00.tsv.gz to load the dataset (ref. shunk031/huggingface-datasets_JGLUE#9).

I have personally reported this issue to the AWS representative and am awaiting a response.

@kaisugi
Copy link

kaisugi commented Jul 31, 2023

Sorry for the confusion!! And thank you for your quick followup @shunk031 !

@tomohideshibata
Copy link
Contributor

Thank you for you report. Wait for the response.

@tomohideshibata
Copy link
Contributor

The following post says that "Amazon has decided to stop distributing the multilingual reviews dataset." We wait for an official announcement.
https://huggingface.co/datasets/amazon_reviews_multi/discussions/4#64c3898db63057f1fd3ce1a0

@VariousBuilder
Copy link

現在JGLUEは使えない状況のようですが、別のところのどこからか利用できる場所はあったりはしますか?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants