Skip to content

Commit

Permalink
add task data list and download links
Browse files Browse the repository at this point in the history
  • Loading branch information
ymcui committed Jun 27, 2019
1 parent 388efcf commit 6076022
Show file tree
Hide file tree
Showing 15 changed files with 31 additions and 0 deletions.
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,12 @@ chinese_wwm_L-12_H-768_A-12.zip
其中`bert_config.json``vocab.txt`与谷歌原版`**BERT-base, Chinese**`完全一致。


### 测试任务数据
我们提供部分任务数据,请查看`data`目录了解。
压缩包内包含训练和测试数据,同一目录下的`README.md`标明数据来源。
由于一部分数据需要原作者授权,故我们无法提供下载链接,敬请谅解。


## 中文基线系统效果
为了对比基线效果,我们在以下几个中文数据集上进行了测试,包括`句子级``篇章级`任务。

Expand Down
3 changes: 3 additions & 0 deletions README_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,9 @@ chinese_wwm_L-12_H-768_A-12.zip
`bert_config.json` and `vocab.txt` are identical to the original **`BERT-base, Chinese`** by Google。


#### Task Data
We only provide the data that is publically available, check `data` directory.

## Baselines
We experiment on several Chinese datasets, including sentence-level to document-level tasks.

Expand Down
3 changes: 3 additions & 0 deletions data/bqcorpus/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
http://icrc.hitsz.edu.cn/info/1037/1162.htm

版权限制无法提供直接下载链接,可自行GitHub搜索。
1 change: 1 addition & 0 deletions data/chnsenticorp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://github.com/pengming617/bert_classification/tree/master/data
Binary file added data/chnsenticorp/chnsenticorp.zip
Binary file not shown.
3 changes: 3 additions & 0 deletions data/cjrc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
http://cail.cipsc.org.cn

测试集是in-house的,无法提供下载。
1 change: 1 addition & 0 deletions data/cmrc2018/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://github.com/ymcui/cmrc2018
1 change: 1 addition & 0 deletions data/drcd/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://github.com/DRCKnowledgeTeam/DRCD
3 changes: 3 additions & 0 deletions data/lcqmc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
http://icrc.hitsz.edu.cn/info/1037/1146.htm

版权限制无法提供直接下载链接,可自行GitHub搜索。
1 change: 1 addition & 0 deletions data/msra-ner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://github.com/OYE93/Chinese-NLP-Corpus
1 change: 1 addition & 0 deletions data/peopledaily/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://github.com/ProHiryu/bert-chinese-ner/tree/master/data
Binary file added data/peopledaily/peopledaily.zip
Binary file not shown.
3 changes: 3 additions & 0 deletions data/thucnews/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
https://github.com/gaussic/text-classification-cnn-rnn

由于文件过大,请通过原地址下载。
1 change: 1 addition & 0 deletions data/weibo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/weibo_senti_100k/intro.ipynb
4 changes: 4 additions & 0 deletions data/xnli/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
https://github.com/facebookresearch/XNLI
https://github.com/google-research/bert/blob/master/multilingual.md#fine-tuning-example

第一个是原地址,第二个是BERT官方使用的(我们从这里下载)。

0 comments on commit 6076022

Please sign in to comment.