This repository is used for a language modelling pareto competition at TTIC.
(time ratio, perplexity)
time ratio = training time/ base model training time. (The base model is trained with default params and trained on a single CPU with rougly 1 hour)
Note: Your training time must be calculated from a single CPU.
I implemented Sampled Softmax method to the originial RNNS model. In addition, an implementation of using pre-trained word embedding with size of 200 and 300 from GloVe can be found in the main.py. The model is also trained with Adagrad optimizer + L2 weight decay.
This codebase requires Python 3.5, PyTorch
Please download the GloVe from here: word2vec-api
or
download from here directly: Wikipedia+Gigaword 5.
-python main.py --soft --adagrad --lr 0.01 # Train a LSTM on PTB with sampled softmax and using Adagrad as the optimizer with learning rate = 0.01
python main.py --pre --emsize 300 # Train a LSTM on PTB with pre-trained embedding with emsize 300
python generate.py # Generate samples from the trained LSTM model.
This repository contains the code originally forked from the Word-level language modeling RNN that is modified to present attention layer into the model.