Implementation of the language model for Contextual chinese strokes Embeddings with PyTorch
This repository contains an implementation with PyTorch of the sequential model presented in the paper"Contextual String Embeddings for Sequence Labeling" by Alan Akbik et al. in 2018.
source code: Falir
At the root of the project, you will see:
├── pyLM
| └── callback
| | └── lrscheduler.py
| | └── trainingmonitor.py
| | └── ...
| └── config
| | └── basic_config.py #a configuration file for storing model parameters
| └── dataset
| └── io
| | └── dataset.py
| | └── data_transformer.py
| └── model
| | └── nn
| | └── layers
| └── output #save the ouput of model
| └── preprocessing #text preprocessing
| └── train #used for training a model
| | └── trainer.py
| | └── ...
| └── test
| | └── embedding.py
| └── utils # a set of utility functions
├── obtain_word_embedding.py
├── train_stroke_lm.py
- csv
- tqdm
- numpy
- pickle
- scikit-learn
- PyTorch 1.0
- matplotlib
- pandas
- Prepare data, you can modify the
io.data_transformer.py
to adapt your data. - Modify configuration information in
pyLM/config/basic_config.py
(the path of data,...). - Run
train_stroke_lm.py
to training language model. - Run
obtain_word_embedding.py
to obtaining word embedding.