Skip to content

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Notifications You must be signed in to change notification settings

yan-qh/Pretrained-Language-Model

Repository files navigation

Pretrained Language Model

This repository provides the latest pretrained language models and its related optimization techniques developed by Huawei Noah's Ark Lab.

Directory structure

  • PanGu-α is a 200B parameter autoregressive pretrained Chinese language model.
  • NEZHA-TensorFlow is a pretrained Chinese language model which achieves the state-of-the-art performances on several Chinese NLP tasks developed by TensorFlow.
  • NEZHA-PyTorch is the PyTorch version of NEZHA.
  • NEZHA-Gen-TensorFlow provides two GPT models. One is Yuefu (乐府), a Chinese Classical Poetry generation model, the other is a common Chinese GPT model.
  • TinyBERT is a compressed BERT model which achieves 7.5x smaller and 9.4x faster on inference.
  • TinyBERT-MindSpore is a MindSpore version of TinyBERT.
  • DynaBERT is a dynamic BERT model with adaptive width and depth.
  • BBPE provides a byte-level vocabulary building tool and its correspoinding tokenizer.
  • PMLM is an improved method for pretrained language model. Trained without the complex two-stream self-attention, PMLM can be treated as a simple approximation of XLNet.
  • TernaryBERT is a pytorch version quantization method for BERT model.
  • HyperText is an efficient text classification model using hyperbolic geometry theories.
  • SumTitles is a summarization corpus with low extractivity.

About

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Other 0.8%