From adeec4b2cf3366b00f5ba0cbfa77fa54a02c3db7 Mon Sep 17 00:00:00 2001 From: Genta Indra Winata Date: Wed, 14 Oct 2020 14:15:36 +0800 Subject: [PATCH 1/2] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 78b8260..680af4d 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # IndoNLU ![Pull Requests Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat) [![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/indobenchmark/indonlu/blob/master/LICENSE) [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg)](code_of_conduct.md) -IndoNLU is a collection of Natural Language Understanding (NLU) resources for Bahasa Indonesia with 12 downstream tasks. We provide the code to reproduce the results and large pre-trained models (IndoBERT and IndoBERT-lite) trained with around 4 billion word corpus (Indo4B), more than 20 GB of text data. This project was initially started by a joint collaboration between universities and industry, such as Institut Teknologi Bandung, Universitas Multimedia Nusantara, The Hong Kong University of Science and Technology, Universitas Indonesia, Gojek, and Prosa.AI. +IndoNLU is a collection of Natural Language Understanding (NLU) resources for Bahasa Indonesia with 12 downstream tasks. We provide the code to reproduce the results and large pre-trained models (IndoBERT and IndoBERT-lite) trained with around 4 billion word corpus (Indo4B), more than 20 GB of text data. This project was initially started by a joint collaboration between universities and industry, such as Institut Teknologi Bandung, Universitas Multimedia Nusantara, The Hong Kong University of Science and Technology, Universitas Indonesia, Gojek, and Prosa.AI. ## Research Paper IndoNLU has been accepted by AACL-IJCNLP 2020 and you can find the details in our preprint https://arxiv.org/abs/2009.05387. @@ -51,7 +51,7 @@ We provide the access to our large pretraining dataset. In this version, we excl - Phase 2 [[Link]](https://huggingface.co/indobenchmark/indobert-lite-large-p2) ## FastText (Indo4B) -- Will be released soon! +- Uncased [[Model file]](https://storage.googleapis.com/babert-pretraining/IndoNLU_finals/models/fasttext/fasttext.4B.id.300.epoch5.uncased.bin) [[Vector file]](https://storage.googleapis.com/babert-pretraining/IndoNLU_finals/models/fasttext/fasttext.4B.id.300.epoch5.uncased.vec.zip) ## Leaderboard - Community Portal and Public Leaderboard [[Link]](https://www.indobenchmark.com/leaderboard.html) From a91815f5c803724d4ed8b536db546967ae660d1c Mon Sep 17 00:00:00 2001 From: Genta Indra Winata Date: Wed, 14 Oct 2020 14:17:55 +0800 Subject: [PATCH 2/2] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 680af4d..62b4f9e 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ We provide the access to our large pretraining dataset. In this version, we excl - Phase 2 [[Link]](https://huggingface.co/indobenchmark/indobert-lite-large-p2) ## FastText (Indo4B) -- Uncased [[Model file]](https://storage.googleapis.com/babert-pretraining/IndoNLU_finals/models/fasttext/fasttext.4B.id.300.epoch5.uncased.bin) [[Vector file]](https://storage.googleapis.com/babert-pretraining/IndoNLU_finals/models/fasttext/fasttext.4B.id.300.epoch5.uncased.vec.zip) +- Uncased 11.9 GB Model File [[Link]](https://storage.googleapis.com/babert-pretraining/IndoNLU_finals/models/fasttext/fasttext.4B.id.300.epoch5.uncased.bin) 3.9 GB Vector File [[Link]](https://storage.googleapis.com/babert-pretraining/IndoNLU_finals/models/fasttext/fasttext.4B.id.300.epoch5.uncased.vec.zip) ## Leaderboard - Community Portal and Public Leaderboard [[Link]](https://www.indobenchmark.com/leaderboard.html)