Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
ymcui committed Sep 10, 2019
1 parent 66c9dc3 commit 856c2b0
Showing 1 changed file with 44 additions and 3 deletions.
47 changes: 44 additions & 3 deletions README_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ For further accelerating Chinese natural language processing, we provide **Chine


## News
**Upcoming Event: We are going to release `BERT-wwm-ext` which was trained on a much larger data, stay tuned!**
**2019/9/10 We release `RoBERTa-wwm-ext`, check [Download](#Download)**

2019/7/30 We release `BERT-wwm-ext`, which was trained on larger data, check [Download](#Download)

Expand All @@ -21,6 +21,7 @@ For further accelerating Chinese natural language processing, we provide **Chine
|-|-|
| [Introduction](#Introduction) | Introduction to BERT with Whole Word Masking (WWM) |
| [Download](#Download) | Download links for Chinese BERT-wwm |
| [Model Comparison](#Model-Comparison) | Compare the models published in this repository |
| [Baselines](#Baselines) | Baseline results for several Chinese NLP datasets (partial) |
| [Useful Tips](#Useful-Tips) | Provide several useful tips for using Chinese pre-trained models |
| [English BERT-wwm](#English-BERT-wwm) | Download English BERT-wwm (by Google) |
Expand Down Expand Up @@ -62,6 +63,7 @@ As all models are 'BERT-base' variants, we do not incidate 'base' in the followi

| Model | Data | Google Drive | iFLYTEK Cloud |
| :------- | :--------- | :---------: | :---------: |
| **`RoBERTa-wwm-ext, Chinese`** | **Wikipedia+Extended data<sup>[1]</sup>** | **[TensorFlow](https://drive.google.com/open?id=1jMAKIJmPn7kADgD3yQZhpsqM-IRM1qZt)** | **[TensorFlowpw:peMe)](https://pan.iflytek.com:443/link/A136858D5F529E7C385C73EEE336F27B)** |
| **`BERT-wwm-ext, Chinese`** | **Wikipedia+Extended data<sup>[1]</sup>** | **[TensorFlow](https://drive.google.com/open?id=1buMLEjdtrXE2c4G1rpsNGWEx7lUQ0RHi)** <br/>**[PyTorch](https://drive.google.com/open?id=1iNeYFhCBJWeUsIlnW_2K6SMwXkM4gLb_)** | **[TensorFlowpw:thGd)](https://pan.iflytek.com:443/link/8AA4B23D9BCBCBA0187EE58234332B46)** <br/>**[PyTorchpw:bJns)](https://pan.iflytek.com:443/link/4AB35DEBECB79C578BEC9952F78FB6F2)** |
| **`BERT-wwm, Chinese`** | **Wikipedia** | **[TensorFlow](https://drive.google.com/open?id=1RoTQsXp2hkQ1gSRVylRIJfQxJUgkfJMW)** <br/>**[PyTorch](https://drive.google.com/open?id=1AQitrjbvCWc51SYiLN-cJq4e0WiNN4KY)** | **[TensorFlowpw:mva8)](https://pan.iflytek.com:443/link/4B172939D5748FB1A3881772BC97A898)** <br/>**[PyTorchpw:8fX5)](https://pan.iflytek.com:443/link/8D4E8680433E6AD0F33D521EA920348E)** |
| `BERT-base, Chinese`<sup>Google</sup> | Wikipedia | [Google Cloud](https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip) | - |
Expand All @@ -85,13 +87,31 @@ chinese_wwm_L-12_H-768_A-12.zip
#### Task Data
We only provide the data that is publically available, check `data` directory.


## Model Comparison
We list comparisons on the models that were released in this project.
`~BERT` means to inherit the attributes from original Google's BERT.

| - | BERT-wwm | BERT-wwm-ext | RoBERTa-wwm-ext |
| :------- | :---------: | :---------: | :---------: |
| Masking | whole word | whole word | whole word |
| Data | wiki | wiki+extended data | wiki+extended data |
| Device | TPU v3 | TPU v3 | TPU v3 |
| Training Steps | 100K (MAX128) <br/>+100K (MAX512) | 1M (MAX128) <br/>+400K (MAX512) | 1M (MAX512) |
| Batch Size | 2,560 / 384 | 2,560 / 384 | 384 |
| Optimizer | LAMB | LAMB | AdamW |
| Vocabulary | ~BERT vocab | ~BERT vocab | ~BERT vocab |
| Init Checkpoint | ~BERT weight | ~BERT weight | ~BERT weight |


## Baselines
We experiment on several Chinese datasets, including sentence-level to document-level tasks.

**We only list partial results here and kindly advise the readers to read our [technical report](https://arxiv.org/abs/1906.08101).**

- [**CMRC 2018**Span-Extraction Machine Reading Comprehension (Simplified Chinese)](https://github.com/ymcui/cmrc2018)
- [**DRCD**Span-Extraction Machine Reading Comprehension (Traditional Chinese)](https://github.com/DRCSolutionService/DRCD)
- [**CJRC**: Chinese Judiciary Reading Comprehension](http://cail.cipsc.org.cn)
- [**XNLI**Natural Langauge Inference](https://github.com/google-research/bert/blob/master/multilingual.md)
- [**NER**Chinese Named Entity Recognition](http://sighan.cs.uchicago.edu/bakeoff2006/)
- [**THUCNews**Document-level Text Classification](http://thuctc.thunlp.org)
Expand All @@ -108,7 +128,8 @@ The model should answer the questions based on the given passage, which is ident
| BERT | 65.5 (64.4) / 84.5 (84.0) | 70.0 (68.7) / 87.0 (86.3) | 18.6 (17.0) / 43.3 (41.3) |
| ERNIE | 65.4 (64.3) / 84.7 (84.2) | 69.4 (68.2) / 86.6 (86.1) | 19.6 (17.0) / 44.3 (42.8) |
| **BERT-wwm** | 66.3 (65.0) / 85.6 (84.7) | 70.5 (69.1) / 87.4 (86.7) | 21.0 (19.3) / 47.0 (43.9) |
| **BERT-wwm-ext** | **67.1 (65.6) / 85.7 (85.0)** | **71.4 (70.0) / 87.7 (87.0)** | **24.0 (20.0) / 47.3 (44.6)** |
| **BERT-wwm-ext** | 67.1 (65.6) / 85.7 (85.0) | 71.4 (70.0) / 87.7 (87.0) | 24.0 (20.0) / 47.3 (44.6) |
| **RoBERTa-wwm-ext** | **67.4 (66.5) / 87.2 (86.5)** | **72.6 (71.4) / 89.4 (88.8)** | **26.2 (24.6) / 51.0 (49.1)** |


### [DRCD](https://github.com/DRCKnowledgeTeam/DRCD)
Expand All @@ -119,7 +140,20 @@ DRCD is also a span-extraction machine reading comprehension dataset, released b
| BERT | 83.1 (82.7) / 89.9 (89.6) | 82.2 (81.6) / 89.2 (88.8) |
| ERNIE | 73.2 (73.0) / 83.9 (83.8) | 71.9 (71.4) / 82.5 (82.3) |
| **BERT-wwm** | 84.3 (83.4) / 90.5 (90.2) | 82.8 (81.8) / 89.7 (89.0) |
| **BERT-wwm-ext** | **85.0 (84.5) / 91.2 (90.9)** | **83.6 (83.0) / 90.4 (89.9)** |
| **BERT-wwm-ext** | 85.0 (84.5) / 91.2 (90.9) | 83.6 (83.0) / 90.4 (89.9) |
| **RoBERTa-wwm-ext** | **86.6 (85.9) / 92.5 (92.2)** | **85.6 (85.2) / 92.0 (91.7)** |

### CJRC
[**CJRC**](http://cail.cipsc.org.cn) is a Chinese judiciary reading comprehension dataset, released by Joint Laboratory of HIT and iFLYTEK Research. Note that, the data used in these experiments are NOT identical to the official one.

| Model | Development | Test |
| :------- | :---------: | :---------: |
| BERT | 54.6 (54.0) / 75.4 (74.5) | 55.1 (54.1) / 75.2 (74.3) |
| ERNIE | 54.3 (53.9) / 75.3 (74.6) | 55.0 (53.9) / 75.0 (73.9) |
| **BERT-wwm** | 54.7 (54.0) / 75.2 (74.8) | 55.1 (54.1) / 75.4 (74.4) |
| **BERT-wwm-ext** | 55.6 (54.8) / 76.0 (75.3) | 55.6 (54.9) / 75.8 (75.0) |
| **RoBERTa-wwm-ext** | **58.7 (57.6) / 79.1 (78.3)** | **59.0 (57.8) / 79.0 (78.0)** |


### XNLI
We use XNLI data for testing NLI task.
Expand All @@ -130,6 +164,7 @@ We use XNLI data for testing NLI task.
| ERNIE | **79.7 (79.4)** | 78.6 (78.2) |
| **BERT-wwm** | 79.0 (78.4) | 78.2 (78.0) |
| **BERT-wwm-ext** | 79.4 (78.6) | **78.7 (78.3)** |
| **RoBERTa-wwm-ext** | **80.0** (79.2) | **78.8 (78.3)** |

### NER
We use People's Daily and MSRA-NER data for testing Chinese NER.
Expand Down Expand Up @@ -213,6 +248,12 @@ A: Each has its own emphasis and merits. Development of Chinese NLP needs joint
**Q: Any comments on the name of next generation of the pre-trained model?**
A: Maybe ZOE: Zero-shOt Embeddings from language model

**Q: Tell me a little bit more about `RoBERTa-wwm-ext`**
A: integrate whole word masking (wwm) into RoBERTa model, specifically:
1) use whole word masking (but we did not use dynamic masking)
2) remove Next Sentence Prediction (NSP)
3) directly use the data generated by `max_len=512` (but not from `max_len=128` for several steps then `max_len=512`)
4) extended training steps (1M steps)

## Reference
If you find the technical report or resource is useful, please cite the following technical report in your paper.
Expand Down

0 comments on commit 856c2b0

Please sign in to comment.