Skip to content

Latest commit

 

History

History
256 lines (228 loc) · 14.5 KB

model.md

File metadata and controls

256 lines (228 loc) · 14.5 KB

Model

Model List

To support the rapid progress of PLMs on text generation, TextBox 2.0 incorporates 47 models/modules. The following table lists the name and reference of each model/module. Click the model/module name for detailed usage instructions.

Category Model Name Reference
General CLM OpenAI-GPT (Radford et al., 2018)
GPT2 (Radford et al., 2019)
GPT_Neo (Gao et al., 2021)
OPT (Artetxe et al., 2022)
Seq2Seq BART (Lewis et al., 2020)
T5 (Raffel et al., 2020)
UniLM (Dong et al., 2019)
MASS (Song et al., 2019)
Pegasus (Zhang et al., 2019)
ProphetNet (Qi et al., 2020)
MVP (Tang et al., 2022)
BERT2BERT (Rothe et al., 2020)
BigBird-Pegasus (Zaheer et al., 2020)
LED (Beltagy et al., 2020)
LongT5 (Guo et al., 2021)
PegasusX (Phang et al., 2022)
Multilingual Models mBART (Liu et al., 2020)
mT5 (Xue et al., 2020)
Marian (Tiedemann et al., 2020)
M2M_100 (Fan et al., 2020)
NLLB (NLLB Team, 2022)
XLM (Lample et al., 2019)
XLM-RoBERTa (Conneau et al., 2019)
XLM-ProphetNet (Qi et al., 2020)
Chinese Models CPM (Zhang et al., 2020)
CPT (Shao et al., 2021)
Chinese-BART
Chinese-GPT2 (Zhao et al., 2019)
Chinese-T5
Chinese-Pegasus
Dialogue Models Blenderbot (Roller et al., 2020)
Blenderbot-Small
DialoGPT (Zhang et al., 2019)
Conditional Models CTRL (Keskar et al., 2019)
PPLM (Dathathri et al., 2019)
Distilled Models DistilGPT2 (Sanh et al., 2019)
DistilBART (Shleifer et al., 2020)
Prompting Models PTG (Li et al., 2022a)
Context-Tuning (Tang et al., 2022)
Lightweight Modules Adapter (Houlsby et al., 2019)
Prefix-tuning (Li and Liang, 2021)
Prompt tuning (Lester et al., 2021)
LoRA (Hu et al., 2021)
BitFit (Ben-Zaken et al. ,2021)
P-Tuning v2 (Liu et al., 2021a)
Non-Pre-training Models RNN (Sutskever et al., 2014)
Transformer (Vaswani et al., 2017b)

Pre-trained Model Parameters

TextBox 2.0 is compatible with Hugging Face, so model_path receives a name of model on Hugging Face like facebook/bart-base or just a local path. config_path and tokenizer_path (same value as model_path by default) also receive a Hugging Face model or a local path.

Besides, config_kwargs and tokenizer_kwargs are useful when additional parameters are required.

For example, when building a Task-oriented Dialogue System, special tokens can be added with additional_special_tokens; fast tokenization can also be switched with use_fast:

config_kwargs: {}
tokenizer_kwargs: {'use_fast': False, 'additional_special_tokens': ['[db_0]', '[db_1]', '[db_2]'] }

Other commonly used parameters include label_smoothing: <smooth-loss-weight>

The full keyword arguments should be found in PreTrainedTokenizer or documents of the corresponding tokenizer.

Generation Parameters

The pre-trained model can perform generation using various methods by combining different parameters. By default, beam search is adapted:

generation_kwargs: {'num_beams': 5, 'early_stopping': True}

Nucleus sampling is also supported by pre-trained model:

generation_kwargs: {'do_sample': True, 'top_k': 10, 'top_p': 0.9}