Skip to content

Commit

Permalink
add t12 results
Browse files Browse the repository at this point in the history
  • Loading branch information
ramadistra committed Aug 16, 2018
1 parent 9649f83 commit 5bcd7cd
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions language_modeling.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ Within these 100 million bytes are 205 unique tokens.
| ---------------- | :-----: | :-----: | --- |
| T64 (Al-Rfou et al., 2018) | 1.06 | 235M | [Character-Level Language Modeling with Deeper Self-Attention](https://arxiv.org/abs/1808.04444)
| mLSTM + dynamic eval (Krause et al., 2017)* | 1.08 | 46M | [Dynamic Evaluation of Neural Sequence Models](https://arxiv.org/abs/1709.07432)
| T12 (Al-Rfou et al., 2018) | 1.11 | 44M | [Character-Level Language Modeling with Deeper Self-Attention](https://arxiv.org/abs/1808.04444)
| 3 layer AWD-LSTM (Merity et al., 2018) | 1.232 | 47M | [An Analysis of Neural Language Modeling at Multiple Scales](https://arxiv.org/abs/1803.08240) |
| Large FS-LSTM-4 (Mujika et al., 2017) | 1.245 | 47M | [Fast-Slow Recurrent Neural Networks](https://arxiv.org/abs/1705.08639) |
| Large mLSTM +emb +WN +VD (Krause et al., 2017) | 1.24 | 46M | [Multiplicative LSTM for sequence modelling](https://arxiv.org/abs/1609.07959)
Expand All @@ -64,6 +65,7 @@ Within these 100 million bytes are 205 unique tokens.
| Model | Bit per Character (BPC) | Number of params | Paper / Source |
| ---------------- | :-----: | :-----: | --- |
| T64 (Al-Rfou et al., 2018) | 1.13 | 235M | [Character-Level Language Modeling with Deeper Self-Attention](https://arxiv.org/abs/1808.04444)
| T12 (Al-Rfou et al., 2018) | 1.18 | 44M | [Character-Level Language Modeling with Deeper Self-Attention](https://arxiv.org/abs/1808.04444)
| mLSTM + dynamic eval (Krause et al., 2017)* | 1.19 | 45M | [Dynamic Evaluation of Neural Sequence Models](https://arxiv.org/abs/1709.07432)
| Large mLSTM +emb +WN +VD (Krause et al., 2016) | 1.27 | 45M | [Multiplicative LSTM for sequence modelling](https://arxiv.org/abs/1609.07959)
| Large RHN (Zilly et al., 2016) | 1.27 | 46M | [Recurrent Highway Networks](https://arxiv.org/abs/1607.03474)
Expand Down

0 comments on commit 5bcd7cd

Please sign in to comment.