Skip to content

PrideLee/CRF-bi-LSTM-sequence-tagging-Chinese-characters-

Repository files navigation

CRF-BiLSTM NER

  We use Python3.6+tensorflow1.12.0 to coding the bi-LSTM+CRF network structure and realizing the sequence tagging for chinese characters.

  (1)Raw data preprocessing. Calling vocab_build() function to convert the data of .txt format to .pkl format. And initilazing the word vectors.(We set the dimension of vectors is 300)

  (2)Designing the network structure and hyper-parameters, calling main.py train and test model.

  • batch_size=64;
  • epoch=25;
  • learning_rate=0.001;
  • dropout=0.5;
  • gradient_clipping=5.0;
  • LSTM_num(forward)=300;
  • LSTM_num(backward)=300;
  • optimizer=Adam;
  • ...

   After 25 epochs, the precision, recall, F1 values of test set as shoun below.

Results
Fig. precision, recall, F1 values

  • precision=0.951266;
  • recall=0.908613;
  • f1=0.929270. (After 25 iterations)

  If you want to know more about the LSTM+CRF model, please to read sequence labelling.md or click here.

About

This project is used to realize sequence tagging by CRF+BiLSTM model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages