Skip to content

Commit

Permalink
update desc for rbt3/rbtl3 in EN
Browse files Browse the repository at this point in the history
  • Loading branch information
ymcui committed Jan 21, 2020
1 parent a5e28ae commit e1ee19d
Showing 1 changed file with 37 additions and 2 deletions.
39 changes: 37 additions & 2 deletions README_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ For further accelerating Chinese natural language processing, we provide **Chine


## News
**2019/12/19 The models in this repository now can be easily accessed through [Huggingface-Transformers](https://github.com/huggingface/transformers), check [Quick Load](#Quick-Load)**
**2020/1/20 Happy Chinese New Year! We've released RBT3 and RBTL3 (3-layer RoBERTa-wwm-ext-base/large), check [Small Models](#Small-Models)**

2019/12/19 The models in this repository now can be easily accessed through [Huggingface-Transformers](https://github.com/huggingface/transformers), check [Quick Load](#Quick-Load)

2019/10/14 We release `RoBERTa-wwm-ext-large`, check [Download](#Download)

Expand All @@ -27,6 +29,7 @@ For further accelerating Chinese natural language processing, we provide **Chine
| [Download](#Download) | Download links for Chinese BERT-wwm |
| [Model Comparison](#Model-Comparison) | Compare the models published in this repository |
| [Baselines](#Baselines) | Baseline results for several Chinese NLP datasets (partial) |
| [Small Models](#Small-Models) | 3-layer Transformer models |
| [Useful Tips](#Useful-Tips) | Provide several useful tips for using Chinese pre-trained models |
| [English BERT-wwm](#English-BERT-wwm) | Download English BERT-wwm (by Google) |
| [FAQ](#FAQ) | Frequently Asked Questions |
Expand Down Expand Up @@ -67,6 +70,8 @@ As all models are 'BERT-base' variants, we do not incidate 'base' in the followi

| Model | Data | Google Drive | iFLYTEK Cloud |
| :------- | :--------- | :---------: | :---------: |
| **`RBTL3, Chinese`** | **Wikipedia+Extended data<sup>[1]</sup>** | **[TensorFlow](https://drive.google.com/open?id=1Jzn1hYwmv0kXkfTeIvNT61Rn1IbRc-o8)**<br/>**[PyTorch](https://drive.google.com/open?id=1qs5OasLXXjOnR2XuGUh12NanUl0pkjEv)** | **[TensorFlowpw:vySW)](https://pan.iflytek.com:443/link/0DD18FAC080BAF75DBA28FB5C0047760)**<br/>**[PyTorchpw:rgCs)](https://pan.iflytek.com:443/link/7C6A513BED2D42170B6DBEE5A866FB3F)** |
| **`RBT3, Chinese`** | **Wikipedia+Extended data<sup>[1]</sup>** | **[TensorFlow](https://drive.google.com/open?id=1-rvV0nBDvRCASbRz8M9Decc3_8Aw-2yi)**<br/>**[PyTorch](https://drive.google.com/open?id=1_LqmIxm8Nz1Abvlqb8QFZaxYo-TInOed)** | **[TensorFlowpw:b9nx)](https://pan.iflytek.com:443/link/275E5B46185C982D4AF5AC295E1651B6)**<br/>**[PyTorchpw:Yoep)](https://pan.iflytek.com:443/link/A094EB0A73B1E7209FEBC6C5CF7AEF27)** |
| **`RoBERTa-wwm-ext-large, Chinese`** | **Wikipedia+Extended data<sup>[1]</sup>** | **[TensorFlow](https://drive.google.com/open?id=1dtad0FFzG11CBsawu8hvwwzU2R0FDI94)**<br/>**[PyTorch](https://drive.google.com/open?id=1-2vEZfIFCdM1-vJ3GD6DlSyKT4eVXMKq)** | **[TensorFlowpw:u6gC)](https://pan.iflytek.com:443/link/AC056611607108F33A744A0F56D0F6BE)**<br/>**[PyTorchpw:43eH)](https://pan.iflytek.com:443/link/9B46A0ABA70C568AAAFCD004B9A2C773)** |
| **`RoBERTa-wwm-ext, Chinese`** | **Wikipedia+Extended data<sup>[1]</sup>** | **[TensorFlow](https://drive.google.com/open?id=1jMAKIJmPn7kADgD3yQZhpsqM-IRM1qZt)** <br/>**[PyTorch](https://drive.google.com/open?id=1eHM3l4fMo6DsQYGmey7UZGiTmQquHw25)** | **[TensorFlowpw:Xe1p)](https://pan.iflytek.com:443/link/98D11FAAF0F0DBCB094EE19CCDBC98BF)** <br/>**[PyTorchpw:waV5)](https://pan.iflytek.com:443/link/92ADD2C34C91F3B44E0EC97F101F89D8)**|
| **`BERT-wwm-ext, Chinese`** | **Wikipedia+Extended data<sup>[1]</sup>** | **[TensorFlow](https://drive.google.com/open?id=1buMLEjdtrXE2c4G1rpsNGWEx7lUQ0RHi)** <br/>**[PyTorch](https://drive.google.com/open?id=1iNeYFhCBJWeUsIlnW_2K6SMwXkM4gLb_)** | **[TensorFlowpw:4cMG)](https://pan.iflytek.com:443/link/653637473FFF242C3869D77026C9BDB5)** <br/>**[PyTorchpw:XHu4)](https://pan.iflytek.com:443/link/B9ACE1C9F228A0F42242672EF6CE1721)** |
Expand Down Expand Up @@ -105,6 +110,9 @@ The actual model and its `MODEL_NAME` are listed below.
| RoBERTa-wwm-ext | hfl/chinese-roberta-wwm-ext |
| BERT-wwm-ext | hfl/chinese-bert-wwm-ext |
| BERT-wwm | hfl/chinese-bert-wwm |
| RBT3 | hfl/rbt3 |
| RBTL3 | hfl/rbtl3 |


### Task Data
We only provide the data that is publically available, check `data` directory.
Expand Down Expand Up @@ -271,7 +279,34 @@ Evaluation Metrics: Accuracy
| **RoBERTa-wwm-ext** | 98.3 (97.9) | 97.7 (97.5) |
| **RoBERTa-wwm-ext-large** | 98.3 (97.7) | 97.8 (97.6) |

</details>
### Small Models
We list RBT3 and RBTL3 results on several NLP tasks. Note that, we only list test set results.

| Model | CMRC 2018 | DRCD | XNLI | CSC | LCQMC | BQ | Average | Params |
| :------- | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: |
| RoBERTa-wwm-ext-large | 74.2 / 90.6 | 89.6 / 94.5 | 81.2 | 95.8 | 87.0 | 85.8 | 87.335 | 325M |
| RoBERTa-wwm-ext | 72.6 / 89.4 | 85.6 / 92.0 | 78.8 | 95.6 | 86.4 | 85.0 | 85.675 | 102M |
| RBTL3 | 63.3 / 83.4 | 77.2 / 85.6 | 74.0 | 94.2 | 85.1 | 83.6 | 80.800 | 61M (59.8%) |
| RBT3 | 62.2 / 81.8 | 75.0 / 83.9 | 72.3 | 92.8 | 85.1 | 83.3 | 79.550 | 38M (37.3%) |

Relative performance:

| Model | CMRC 2018 | DRCD | XNLI | CSC | LCQMC | BQ | Average | AVG-C |
| :------- | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: |
| RoBERTa-wwm-ext-large | 102.2% / 101.3% | 104.7% / 102.7% | 103.0% | 100.2% | 100.7% | 100.9% | 101.9% | 101.2% |
| RoBERTa-wwm-ext | 100% / 100% | 100% / 100% | 100% | 100% | 100% | 100% | 100% | 100% |
| RBTL3 | 87.2% / 93.3% | 90.2% / 93.0% | 93.9% | 98.5% | 98.5% | 98.4% | 94.3% | 97.35% |
| RBT3 | 85.7% / 91.5% | 87.6% / 91.2% | 91.8% | 97.1% | 98.5% | 98.0% | 92.9% | 96.35% |

* AVG-C: average score of classification tasks: XNLI, CSC, LCQMC, BQ

- The numbers of parameter are calculated based on XNLI classification task.
- Relative parameter percentage is calculated based on RoBERTa-wwm-ext model.
- RBT3: We use RoBERTa-wwm-ext for initializing the first three layers, and continue to train 1M steps.
- RBTL3: We use RoBERTa-wwm-ext-large for initializing the first three layers, and continue to train 1M steps.
- The name of RBT is the syllables of 'RoBERTa', and 'L' stands for large model.
- Directly using the first three layers of RoBERTa-wwm-ext-large to fine-tune the downstream task will result in a bad performance. For example, in CMRC 2018 task we could only achieve 42.9/65.3, while RBTL3 could reach 63.3/83.4.


## Useful Tips
* Initial learning rate is the most important hyper-parameters (regardless of BERT or other neural networks), and should ALWAYS be tuned for better performance.
Expand Down

0 comments on commit e1ee19d

Please sign in to comment.