punctuator2/scripts at 9442bd1c9b5dc81650106e83f4a6949a21b6a162 · bugbakery/punctuator2

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
convert_europarl.py		convert_europarl.py
convert_europarl.sh		convert_europarl.sh
convert_ted.py		convert_ted.py
convert_ted.sh		convert_ted.sh
run.sh		run.sh

README.md

This example produces the preprocessed Europarl English corpus that can be then used for training a model.

Requires nltk

Usage example: ./run.sh

cd ..

python data.py ./example/out

python main.py ep 256 0.02

python play_with_model.py Model_ep_h256_lr0.02.pcl

The input text to play_with_model.py should be similar to the contents of the preprocessed files in ./example/out (i.e. lowercased, numeric tokens replaced with ), but should not contain punctuation tokens.

Training time on this dataset with a Nvidia Tesla K20 GPU was about 15 hours (~3500 samples per second)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

README.md

Files

scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

scripts

Folders and files

parent directory

README.md