Skip to content

chenxingphh/text-classification-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text-classification-pytorch

基于Pytorch实现大多数常见的文本分类模型

实现的分类模型

  • FastText: Bag of Tricks for Efficient Text Classification
  • DPCNN: Deep Pyramid Convolutional Neural Networks for Text Categorization
  • TextCNN: Convolutional Neural Networks for Sentence Classification
  • RCNN: Recurrent Convolutional Neural Networks for Text Classification
  • HAN: Hierarchical Attention Networks for Document Classification
  • Bi-GRU: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
  • Bi-LSTM: Long Short-term Memory
  • Transformer-Encoder: Attention is all you need
  • Transformer-XL-Encoder: Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

数据集

IMDB是一个电影评论的二分类(pos, neg)数据集,专门用于情绪分析。
IMDB的训练集数目为25000,测试集数目为25000,没有额外提供验证集。

超参数

seed: 512
epoch: 20
optimizer: Adam
warm_up_proportion: 0.1
lr: 0.0005
batch_size: 128
pretrain_embedding: Glove-6B

模型效果

  • 对于非预训练模型,这里使用了不同维度的Glove-6B预训练词向量,维度包括50, 100, 200
Model Test(dim=50) Test(dim=100) Test(dim=200)
FastText 0.8809 0.8830 0.8836
DPCNN 0.8648 0.8722 0.8798
TextCNN 0.8680 0.8689 0.8808
RCNN 0.8716 0.8864 0.9006
HAN 0.8852 0.8904 0.8973
Bi-GRU 0.8535 0.8715 0.8873
Bi-LSTM 0.8611 0.8708 0.8871
Transformer-Encoder 0.8650 0.8670 0.8774
Transformer-XL-Encoder 0.8751 0.8844 0.8791

License

MIT LICENSE

About

Most common text classification models implemented by pytorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages