diff --git a/README.md b/README.md index 90d2522..be838dc 100644 --- a/README.md +++ b/README.md @@ -100,16 +100,16 @@ PyTorch版本则包含`pytorch_model.bin`, `bert_config.json`, `vocab.txt`文件 ## 模型对比 针对大家比较关心的一些模型细节进行汇总如下。`~BERT`表示**继承**谷歌原版中文BERT的属性。 -| - | BERT-wwm | BERT-wwm-ext | RoBERTa-wwm-ext | -| :------- | :---------: | :---------: | :---------: | -| Masking | whole word | whole word | whole word | -| Data | wiki | wiki+extended data | wiki+extended data | -| Device | TPU v3 | TPU v3 | TPU v3 | -| Training Steps | 100K (MAX128)
+100K (MAX512) | 1M (MAX128)
+400K (MAX512) | 1M (MAX512) | -| Batch Size | 2,560 / 384 | 2,560 / 384 | 384 | -| Optimizer | LAMB | LAMB | AdamW | -| Vocabulary | ~BERT vocab | ~BERT vocab | ~BERT vocab | -| Init Checkpoint | ~BERT weight | ~BERT weight | ~BERT weight | +| - | BERTGoogle | BERT-wwm | BERT-wwm-ext | RoBERTa-wwm-ext | +| :------- | :---------: | :---------: | :---------: | :---------: | +| Masking | WordPiece | whole word | whole word | whole word | +| Data | wiki | wiki | wiki+extended data | wiki+extended data | +| Device | TPU v2 Pod | TPU v3 | TPU v3 | TPU v3 | +| Training Steps | ? | 100K (MAX128)
+100K (MAX512) | 1M (MAX128)
+400K (MAX512) | 1M (MAX512) | +| Batch Size | ? | 2,560 / 384 | 2,560 / 384 | 384 | +| Optimizer | AdamW | LAMB | LAMB | AdamW | +| Vocabulary | 21128 | ~BERT vocab | ~BERT vocab | ~BERT vocab | +| Init Checkpoint | RandomInit | ~BERT weight | ~BERT weight | ~BERT weight | ## 中文基线系统效果 diff --git a/README_EN.md b/README_EN.md index f16ada1..22055e9 100644 --- a/README_EN.md +++ b/README_EN.md @@ -92,16 +92,16 @@ We only provide the data that is publically available, check `data` directory. We list comparisons on the models that were released in this project. `~BERT` means to inherit the attributes from original Google's BERT. -| - | BERT-wwm | BERT-wwm-ext | RoBERTa-wwm-ext | -| :------- | :---------: | :---------: | :---------: | -| Masking | whole word | whole word | whole word | -| Data | wiki | wiki+extended data | wiki+extended data | -| Device | TPU v3 | TPU v3 | TPU v3 | -| Training Steps | 100K (MAX128)
+100K (MAX512) | 1M (MAX128)
+400K (MAX512) | 1M (MAX512) | -| Batch Size | 2,560 / 384 | 2,560 / 384 | 384 | -| Optimizer | LAMB | LAMB | AdamW | -| Vocabulary | ~BERT vocab | ~BERT vocab | ~BERT vocab | -| Init Checkpoint | ~BERT weight | ~BERT weight | ~BERT weight | +| - | BERTGoogle | BERT-wwm | BERT-wwm-ext | RoBERTa-wwm-ext | +| :------- | :---------: | :---------: | :---------: | :---------: | +| Masking | WordPiece | whole word | whole word | whole word | +| Data | wiki | wiki | wiki+extended data | wiki+extended data | +| Device | TPU v2 Pod | TPU v3 | TPU v3 | TPU v3 | +| Training Steps | ? | 100K (MAX128)
+100K (MAX512) | 1M (MAX128)
+400K (MAX512) | 1M (MAX512) | +| Batch Size | ? | 2,560 / 384 | 2,560 / 384 | 384 | +| Optimizer | AdamW | LAMB | LAMB | AdamW | +| Vocabulary | 21128 | ~BERT vocab | ~BERT vocab | +| Init Checkpoint | RandomInit | ~BERT weight | ~BERT weight | ~BERT weight | ## Baselines