update desc for rbt3/rbtl3 in EN

robertaaa · Jan 21, 2020 · e1ee19d · e1ee19d
1 parent a5e28ae
commit e1ee19d
Showing 1 changed file with 37 additions and 2 deletions.
diff --git a/README_EN.md b/README_EN.md
@@ -9,7 +9,9 @@ For further accelerating Chinese natural language processing, we provide **Chine
 
 
 ## News
-**2019/12/19 The models in this repository now can be easily accessed through [Huggingface-Transformers](https://github.com/huggingface/transformers), check [Quick Load](#Quick-Load)**
+**2020/1/20 Happy Chinese New Year! We've released RBT3 and RBTL3 (3-layer RoBERTa-wwm-ext-base/large), check [Small Models](#Small-Models)**
+
+2019/12/19 The models in this repository now can be easily accessed through [Huggingface-Transformers](https://github.com/huggingface/transformers), check [Quick Load](#Quick-Load)
 
 2019/10/14 We release `RoBERTa-wwm-ext-large`, check [Download](#Download)
 
@@ -27,6 +29,7 @@ For further accelerating Chinese natural language processing, we provide **Chine
 | [Download](#Download) | Download links for Chinese BERT-wwm |
 | [Model Comparison](#Model-Comparison) | Compare the models published in this repository |
 | [Baselines](#Baselines) | Baseline results for several Chinese NLP datasets (partial) |
+| [Small Models](#Small-Models) | 3-layer Transformer models |
 | [Useful Tips](#Useful-Tips) | Provide several useful tips for using Chinese pre-trained models |
 | [English BERT-wwm](#English-BERT-wwm) | Download English BERT-wwm (by Google) |
 | [FAQ](#FAQ) | Frequently Asked Questions |
@@ -67,6 +70,8 @@ As all models are 'BERT-base' variants, we do not incidate 'base' in the followi
 
 | Model | Data | Google Drive | iFLYTEK Cloud |
 | :------- | :--------- | :---------: | :---------: |
+| **`RBTL3, Chinese`** | **Wikipedia+Extended data<sup>[1]</sup>** | **[TensorFlow](https://drive.google.com/open?id=1Jzn1hYwmv0kXkfTeIvNT61Rn1IbRc-o8)**<br/>**[PyTorch](https://drive.google.com/open?id=1qs5OasLXXjOnR2XuGUh12NanUl0pkjEv)** | **[TensorFlow（pw:vySW）](https://pan.iflytek.com:443/link/0DD18FAC080BAF75DBA28FB5C0047760)**<br/>**[PyTorch（pw:rgCs）](https://pan.iflytek.com:443/link/7C6A513BED2D42170B6DBEE5A866FB3F)** |
+| **`RBT3, Chinese`** | **Wikipedia+Extended data<sup>[1]</sup>** | **[TensorFlow](https://drive.google.com/open?id=1-rvV0nBDvRCASbRz8M9Decc3_8Aw-2yi)**<br/>**[PyTorch](https://drive.google.com/open?id=1_LqmIxm8Nz1Abvlqb8QFZaxYo-TInOed)** | **[TensorFlow（pw:b9nx）](https://pan.iflytek.com:443/link/275E5B46185C982D4AF5AC295E1651B6)**<br/>**[PyTorch（pw:Yoep）](https://pan.iflytek.com:443/link/A094EB0A73B1E7209FEBC6C5CF7AEF27)** |
 | **`RoBERTa-wwm-ext-large, Chinese`** | **Wikipedia+Extended data<sup>[1]</sup>** | **[TensorFlow](https://drive.google.com/open?id=1dtad0FFzG11CBsawu8hvwwzU2R0FDI94)**<br/>**[PyTorch](https://drive.google.com/open?id=1-2vEZfIFCdM1-vJ3GD6DlSyKT4eVXMKq)** | **[TensorFlow（pw:u6gC）](https://pan.iflytek.com:443/link/AC056611607108F33A744A0F56D0F6BE)**<br/>**[PyTorch（pw:43eH）](https://pan.iflytek.com:443/link/9B46A0ABA70C568AAAFCD004B9A2C773)** |
 | **`RoBERTa-wwm-ext, Chinese`** | **Wikipedia+Extended data<sup>[1]</sup>** | **[TensorFlow](https://drive.google.com/open?id=1jMAKIJmPn7kADgD3yQZhpsqM-IRM1qZt)** <br/>**[PyTorch](https://drive.google.com/open?id=1eHM3l4fMo6DsQYGmey7UZGiTmQquHw25)** | **[TensorFlow（pw:Xe1p）](https://pan.iflytek.com:443/link/98D11FAAF0F0DBCB094EE19CCDBC98BF)** <br/>**[PyTorch（pw:waV5）](https://pan.iflytek.com:443/link/92ADD2C34C91F3B44E0EC97F101F89D8)**|
 | **`BERT-wwm-ext, Chinese`** | **Wikipedia+Extended data<sup>[1]</sup>** | **[TensorFlow](https://drive.google.com/open?id=1buMLEjdtrXE2c4G1rpsNGWEx7lUQ0RHi)** <br/>**[PyTorch](https://drive.google.com/open?id=1iNeYFhCBJWeUsIlnW_2K6SMwXkM4gLb_)** | **[TensorFlow（pw:4cMG）](https://pan.iflytek.com:443/link/653637473FFF242C3869D77026C9BDB5)** <br/>**[PyTorch（pw:XHu4）](https://pan.iflytek.com:443/link/B9ACE1C9F228A0F42242672EF6CE1721)** |
@@ -105,6 +110,9 @@ The actual model and its `MODEL_NAME` are listed below.
 | RoBERTa-wwm-ext | hfl/chinese-roberta-wwm-ext |
 | BERT-wwm-ext | hfl/chinese-bert-wwm-ext |
 | BERT-wwm | hfl/chinese-bert-wwm |
+| RBT3 | hfl/rbt3 |
+| RBTL3 | hfl/rbtl3 |
+
 
 ### Task Data
 We only provide the data that is publically available, check `data` directory.
@@ -271,7 +279,34 @@ Evaluation Metrics: Accuracy
 | **RoBERTa-wwm-ext** | 98.3 (97.9) | 97.7 (97.5) |
 | **RoBERTa-wwm-ext-large** | 98.3 (97.7) | 97.8 (97.6) |
 
-</details>
+### Small Models
+We list RBT3 and RBTL3 results on several NLP tasks. Note that, we only list test set results.
+
+| Model | CMRC 2018 | DRCD | XNLI | CSC | LCQMC | BQ | Average | Params | 
+| :------- | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | 
+| RoBERTa-wwm-ext-large | 74.2 / 90.6 | 89.6 / 94.5 | 81.2 | 95.8 | 87.0 | 85.8 | 87.335 | 325M |
+| RoBERTa-wwm-ext | 72.6 / 89.4 | 85.6 / 92.0 | 78.8 | 95.6 | 86.4 | 85.0 | 85.675 | 102M |
+| RBTL3 | 63.3 / 83.4 | 77.2 / 85.6 | 74.0 | 94.2 | 85.1 | 83.6 | 80.800 | 61M (59.8%) | 
+| RBT3 | 62.2 / 81.8 | 75.0 / 83.9 | 72.3 | 92.8 | 85.1 | 83.3 | 79.550 | 38M (37.3%) | 
+
+Relative performance:
+
+| Model | CMRC 2018 | DRCD | XNLI | CSC | LCQMC | BQ | Average | AVG-C |
+| :------- | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: | :---------: |
+| RoBERTa-wwm-ext-large | 102.2% / 101.3% | 104.7% / 102.7% | 103.0% | 100.2% | 100.7% | 100.9% | 101.9% | 101.2% |
+| RoBERTa-wwm-ext | 100% / 100% | 100% / 100% | 100% | 100% | 100% | 100% | 100% | 100% | 
+| RBTL3 | 87.2% / 93.3% | 90.2% / 93.0% | 93.9% | 98.5% | 98.5% | 98.4% | 94.3% | 97.35% |
+| RBT3 | 85.7% / 91.5% | 87.6% / 91.2% | 91.8% | 97.1% | 98.5% | 98.0% | 92.9% | 96.35% |
+
+* AVG-C: average score of classification tasks: XNLI, CSC, LCQMC, BQ
+
+- The numbers of parameter are calculated based on XNLI classification task.
+- Relative parameter percentage is calculated based on RoBERTa-wwm-ext model.
+- RBT3: We use RoBERTa-wwm-ext for initializing the first three layers, and continue to train 1M steps.
+- RBTL3: We use RoBERTa-wwm-ext-large for initializing the first three layers, and continue to train 1M steps.
+- The name of RBT is the syllables of 'RoBERTa', and 'L' stands for large model.
+- Directly using the first three layers of RoBERTa-wwm-ext-large to fine-tune the downstream task will result in a bad performance. For example, in CMRC 2018 task we could only achieve 42.9/65.3, while RBTL3 could reach 63.3/83.4.
+
 
 ## Useful Tips
 * Initial learning rate is the most important hyper-parameters (regardless of BERT or other neural networks), and should ALWAYS be tuned for better performance.