Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
BBPE		BBPE
DynaBERT		DynaBERT
HyperText		HyperText
NEZHA-Gen-TensorFlow		NEZHA-Gen-TensorFlow
NEZHA-PyTorch		NEZHA-PyTorch
NEZHA-TensorFlow		NEZHA-TensorFlow
PMLM		PMLM
PanGu-α		PanGu-α
SumTitles		SumTitles
TernaryBERT		TernaryBERT
TinyBERT-MindSpore		TinyBERT-MindSpore
TinyBERT		TinyBERT
README.md		README.md

Repository files navigation

Pretrained Language Model

This repository provides the latest pretrained language models and its related optimization techniques developed by Huawei Noah's Ark Lab.

PanGu-α is a 200B parameter autoregressive pretrained Chinese language model.
NEZHA-TensorFlow is a pretrained Chinese language model which achieves the state-of-the-art performances on several Chinese NLP tasks developed by TensorFlow.
NEZHA-PyTorch is the PyTorch version of NEZHA.
NEZHA-Gen-TensorFlow provides two GPT models. One is Yuefu (乐府), a Chinese Classical Poetry generation model, the other is a common Chinese GPT model.
TinyBERT is a compressed BERT model which achieves 7.5x smaller and 9.4x faster on inference.
TinyBERT-MindSpore is a MindSpore version of TinyBERT.
DynaBERT is a dynamic BERT model with adaptive width and depth.
BBPE provides a byte-level vocabulary building tool and its correspoinding tokenizer.
PMLM is an improved method for pretrained language model. Trained without the complex two-stream self-attention, PMLM can be treated as a simple approximation of XLNet.
TernaryBERT is a pytorch version quantization method for BERT model.
HyperText is an efficient text classification model using hyperbolic geometry theories.
SumTitles is a summarization corpus with low extractivity.