forked from airaria/TextBrewer
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request airaria#47 from airaria/update_exmaple
update mnli_exmaple and bert-emd
- Loading branch information
Showing
44 changed files
with
1,143 additions
and
10,778 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,44 @@ | ||
[**中文说明**](README_ZH.md) | [**English**](README.md) | ||
|
||
This example demonstrates distilltion on MNLI task. | ||
This example demonstrates distilltion on MNLI task and **how to write a new distiller**. | ||
|
||
* run_mnli_train.sh : trains a teacher model (bert-base-cased) on MNLI. | ||
* run_mnli_train.sh : trains a teacher model (bert-base) on MNLI. | ||
* run_mnli_distill_T4tiny.sh : distills the teacher to T4tiny. | ||
* run_mnli_distill_multiteacher.sh : runs multi-teacher distillation,distilling several teacher models into a student model. | ||
* run_mnli_distill_T4tiny_emd.sh : distills the teacher to T4tiny with many-to-many intermediate matches using EMD, so there is no need to specifying the mathcing scheme. This example also demonstrates how to write a custom distiller (see below for details). | ||
* run_mnli_distill_multiteacher.sh : runs multi-teacher distillation, distilling several teacher models into a student model. | ||
|
||
Set the following variables in the shell scripts before running: | ||
Examples have been tested on **PyTorch==1.2.0, transformers==3.0.2**. | ||
|
||
## Run | ||
|
||
1. Set the following variables in the bash scripts before running: | ||
|
||
* BERT_DIR : where BERT-base-cased stores,including vocab.txt, pytorch_model.bin, bert_config.json | ||
* OUTPUT_ROOT_DIR : this directory stores logs and trained model weights | ||
* DATA_ROOT_DIR : it includes MNLI dataset: | ||
* \$\{DATA_ROOT_DIR\}/MNLI/train.tsv | ||
* \$\{DATA_ROOT_DIR\}/MNLI/dev_matched.tsv | ||
* \$\{DATA_ROOT_DIR\}/MNLI/dev_mismatched.tsv | ||
* The trained teacher weights file *trained_teacher_model* has to be specified if running run_mnli_distill_T4tiny.sh | ||
* Multiple teacher weights file *trained_teacher_model_1, trained_teacher_model_2, trained_teacher_model_3* has to be specified if running run_mnli_distill_multiteacher.sh | ||
|
||
2. Set the path to BERT: | ||
* If you are running run_mnli_train.sh: open jsons/TrainBertTeacher.json and set "vocab_file","config_file"和"checkpoint" which are under the key "student". | ||
* If you are running run_mnli_distill_T4tiny.sh or run_mnli_distill_T4tiny_emd.sh: open jsons/DistillBertToTiny.json and set "vocab_file", "config_file" and"checkpoint" which are under the key "teachers". | ||
* If you are running run_mnli_distill_multiteacher.sh: open jsons/DistillMultiBert.json and set all the "vocab_file","config_file" and "checkpoint" under the key "teachers". You can also add more teachers to the json. | ||
|
||
3. Run the bash script and have fun. | ||
|
||
## BERT-EMD and custom distiller | ||
[BERT-EMD](https://www.aclweb.org/anthology/2020.emnlp-main.242/) allows each intermediate student layer to learn from any intermediate teacher layers adaptively, bassed on optimizing Earth Mover’s Distance. So there is no need to specify the mathcing scheme. | ||
|
||
Based on the [original implementation](https://github.com/lxk00/BERT-EMD), we have written a new distiller (EMDDistiller) to implement a simplified viersion of BERT-EMD (which ignores mappings between attentions). The code of the algorithm is in distiller_emd.py. The EMDDistiller is much like the other distillers: | ||
```python | ||
from distiller_emd import EMDDistiller | ||
distiller = EMDDistiller(...) | ||
with distiller: | ||
distiller.train(...) | ||
``` | ||
see main.emd.py for detailed usages. | ||
|
||
EMDDistiller requires pyemd package: | ||
```bash | ||
pip install pyemd | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,46 @@ | ||
[**中文说明**](README_ZH.md) | [**English**](README.md) | ||
|
||
这个例子展示MNLI句对分类任务上的蒸馏。GLUE中的其他任务的蒸馏也类似。 | ||
这个例子展示MNLI句对分类任务上的蒸馏,同时提供了一个**自定义distiller**的例子。 | ||
|
||
* run_mnli_train.sh : 在MNLI数据上训练教师模型(bert-base-cased) | ||
* run_mnli_distill_T4tiny.sh : 在MNLI上蒸馏教师模型到T4Tiny | ||
* run_mnli_distill_multiteacher.sh : 执行多教师蒸馏,将多个教师模型压缩到一个学生模型 | ||
* run_mnli_train.sh : 在MNLI数据上训练教师模型(bert-base)。 | ||
* run_mnli_distill_T4tiny.sh : 在MNLI上蒸馏教师模型到T4Tiny。 | ||
* run_mnli_distill_T4tiny_emd.sh:使用EMD方法自动计算隐层与隐层的匹配,而无需人工指定。该例子同时展示了如何自定义distiller(见下文详解)。 | ||
* run_mnli_distill_multiteacher.sh : 多教师蒸馏,将多个教师模型压缩到一个学生模型。 | ||
|
||
**PyTorch==1.2.0,transformers==3.0.2** 上测试通过。 | ||
|
||
## 运行 | ||
|
||
1. 运行以上任一个脚本前,请根据自己的环境设置sh文件中相应变量: | ||
|
||
运行脚本前,请根据自己的环境设置相应变量: | ||
|
||
* BERT_DIR : 存放BERT-base-cased模型的目录,包含vocab.txt, pytorch_model.bin, bert_config.json | ||
* OUTPUT_ROOT_DIR : 存放训练好的模型和日志 | ||
* DATA_ROOT_DIR : 包含MNLI数据集: | ||
* \$\{DATA_ROOT_DIR\}/MNLI/train.tsv | ||
* \$\{DATA_ROOT_DIR\}/MNLI/dev_matched.tsv | ||
* \$\{DATA_ROOT_DIR\}/MNLI/dev_mismatched.tsv | ||
* 如果是运行 run_mnli_distill_T4tiny.sh, 还需要指定训练好的教师模型权重文件 trained_teacher_model | ||
* 如果是运行 run_mnli_distill_multiteacher.sh, 需要指定多个训练好的教师模型权重文件 trained_teacher_model_1, trained_teacher_model_2, trained_teacher_model_3 | ||
|
||
2. 设置BERT模型路径: | ||
* 如果运行run_mnli_train.sh,修改jsons/TrainBertTeacher.json中"student"键下的"vocab_file","config_file"和"checkpoint"路径 | ||
* 如果运行 run_mnli_distill_T4tiny.sh 或 run_mnli_distill_T4tiny_emd.sh,修改jsons/DistillBertToTiny.json中"teachers"键下的"vocab_file","config_file"和"checkpoint"路径 | ||
* 如果运行 run_mnli_distill_multiteacher.sh, 修改jsons/DistillMultiBert.json中"teachers"键下的所有"vocab_file","config_file"和"checkpoint"路径。可以自行添加更多teacher。 | ||
|
||
3. 设置完成,执行sh文件开始训练。 | ||
|
||
## BERT-EMD与自定义distiller | ||
[BERT-EMD](https://www.aclweb.org/anthology/2020.emnlp-main.242/) 通过优化中间层之间的Earth Mvoer's Distance以自适应地调整教师与学生之间中间层匹配。 | ||
|
||
我们参照了其[原始实现](https://github.com/lxk00/BERT-EMD),并以distiller的形式实现了其一个简化版本EMDDistiller(忽略了attention间的mapping)。 | ||
BERT-EMD相关代码位于distiller_emd.py。EMDDistiller使用方法与其他distiller无太大差异: | ||
```python | ||
from distiller_emd import EMDDistiller | ||
distiller = EMDDistiller(...) | ||
with distiller: | ||
distiller.train(...) | ||
``` | ||
使用方式详见 main.emd.py。 | ||
|
||
EMDDistiller要求pyemd包: | ||
```bash | ||
pip install pyemd | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.