A Joint Chinese segmentation and POS tagger based on bidirectional GRU-CRF
Add instructions on how to use the tagger as a word segmenter (without performing joint POS tagging).
TimeDistributed is completely suppressed now. Modified CNNs for graphical feature extraction. (2018.1.8)
Add instructions on how to tag raw sentences with trained models. (2017.12.9)
Intergrated the feedforward neural network model introduced in Zheng et al. (2013) (2017.11.25)
Updated HiddenLayer for efficiency. TimeDistributed is not applied for output inference anymore. (2017.11.25)
The code is updated to TensorFlow 1.2.0 (2017.7.14)
Dyniamic bidirectional rnn is employed, now it requires drastically less memory both for training and tagging (2017.7.14)
Now the tagger supports bucket model to very efficiently tag very large files.
Python 2.7
TensorFlow 1.2.0 or newer
Pygame (Convert Chinese characters into pictures)
Yan Shao, Christian Hardmeier, Jörg Tiedemann and Joakim Nivre. "Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF", Proceedings of the The 8th International Joint Conference on Natural Language Processing, pages 173–183, Taipei, Taiwan, 2017
http://aclweb.org/anthology/I/I17/I17-1018.pdf
python tagger.py train -p ud1 -t train.txt -d dev.txt -wv -cp -rd -gru -m model_ud1 -emb Embeddings/glove.txt
python tagger.py test -p ud1 -e test.txt -m model_ud1 -emb Embeddings/glove.txt
python tagger.py train -p ud1 -t train.txt -d dev.txt -wv -cp -rd -gru -m model_ud1_1 -emb Embeddings/glove.txt
python tagger.py train -p ud1 -t train.txt -d dev.txt -wv -cp -rd -gru -m model_ud1_2 -emb Embeddings/glove.txt
python tagger.py train -p ud1 -t train.txt -d dev.txt -wv -cp -rd -gru -m model_ud1_3 -emb Embeddings/glove.txt
python tagger.py train -p ud1 -t train.txt -d dev.txt -wv -cp -rd -gru -m model_ud1_4 -emb Embeddings/glove.txt
python tagger.py test -ens -p ud1 -e test.txt -m model_ud1 -emb Embeddings/glove.txt
(simple)
python tagger.py tag -p ud1 -r raw.txt -m model_ud1 -emb Embeddings/glove.txt -opth tagged_file.txt
(ensemble)
python tagger.py tag -ens -p ud1 -r raw.txt -m model_ud1 -emb Embeddings/glove.txt -opth tagged_file.txt
(simple)
python tagger.py tag -p ud1 -r raw.txt -m model_ud1 -emb Embeddings/glove.txt -opth tagged_file.txt -tl
(ensemble)
python tagger.py tag -ens -p ud1 -r raw.txt -m model_ud1 -emb Embeddings/glove.txt -opth tagged_file.txt -tl
add -tg seg
while training.