Split single file into separate files

hieucnm · Jun 24, 2018 · 1979ba0 · 1979ba0
1 parent 54331ac
commit 1979ba0
Show file tree

Hide file tree

Showing 21 changed files with 787 additions and 810 deletions.
diff --git a/README.md b/README.md
diff --git a/chunking.md b/chunking.md
@@ -0,0 +1,22 @@
+## Chunking
+
+Chunking is a shallow form of parsing that identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.
+
+Example:
+
+| Vinken | , | 61 | years | old |
+| --- | ---| --- | --- | --- |
+| B-NLP| I-NP | I-NP | I-NP | I-NP |
+
+### Penn Treebank&mdash;chunking
+
+The [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) is typically used for evaluating chunking.
+Sections 15-18 are used for training, section 19 for development, and and section 20
+for testing. Models are evaluated based on F1.
+
+| Model           | F1 score  |  Paper / Source |
+| ------------- | :-----:| --- |
+| Low supervision (Søgaard and Goldberg, 2016) | 95.57 | [Deep multi-task learning with low level tasks supervised at lower layers](http://anthology.aclweb.org/P16-2038) |
+| Suzuki and Isozaki (2008) | 95.15 | [Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data](https://aclanthology.info/pdf/P/P08/P08-1076.pdf) | 
+
+[Go back to the README](README.md)
diff --git a/constituency_parsing.md b/constituency_parsing.md
@@ -0,0 +1,44 @@
+## Constituency parsing
+
+Consituency parsing aims to extract a constituency-based parse tree from a sentence that 
+represents its syntactic structure according to a [phrase structure grammar](https://en.wikipedia.org/wiki/Phrase_structure_grammar).
+
+Example:
+
+                 Sentence (S)
+                     |
+       +-------------+------------+
+       |                          |
+     Noun (N)                Verb Phrase (VP)
+       |                          |
+     John                 +-------+--------+
+                          |                |
+                        Verb (V)         Noun (N)
+                          |                |
+                        sees              Bill
+
+[Recent approaches](https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf)
+convert the parse tree into a sequence following a depth-first traversal in order to
+be able to apply sequence-to-sequence models to it. The linearized version of the
+above parse tree looks as follows: (S (N) (VP V N)).
+
+### Penn Treebank&mdash;constituency parsing
+
+The Wall Street Journal section of the [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) is used for 
+evaluating constituency parsers. Section 22 is used for development and Section 23 is used for evaluation.
+Models are evaluated based on F1. Most of the below models incorporate external data or features.
+For a comparison of single models trained only on WSJ, refer to [Kitaev and Klein (2018)](https://arxiv.org/abs/1805.01052).
+
+| Model           | F1 score  |  Paper / Source |
+| ------------- | :-----:| --- |
+| Self-attentive encoder + ELMo (Kitaev and Klein, 2018) | 95.13 | [Constituency Parsing with a Self-Attentive Encoder](https://arxiv.org/abs/1805.01052) |
+| Model combination (Fried et al., 2017) | 94.66 | [Improving Neural Parsing by Disentangling Model Combination and Reranking Effects](https://arxiv.org/abs/1707.03058) |
+| In-order (Liu and Zhang, 2017) | 94.2 | [In-Order Transition-based Constituent Parsing](http://aclweb.org/anthology/Q17-1029) |
+| Semi-supervised LSTM-LM (Choe and Charniak, 2016) | 93.8 | [Parsing as Language Modeling](http://www.aclweb.org/anthology/D16-1257) | 
+| Stack-only RNNG (Kuncoro et al., 2017) | 93.6 | [What Do Recurrent Neural Network Grammars Learn About Syntax?](https://arxiv.org/abs/1611.05774) |
+| RNN Grammar (Dyer et al., 2016) | 93.3 | [Recurrent Neural Network Grammars](https://www.aclweb.org/anthology/N16-1024) |
+| Transformer (Vaswani et al., 2017) | 92.7 | [Attention Is All You Need](https://arxiv.org/abs/1706.03762) |
+| Semi-supervised LSTM (Vinyals et al., 2015) | 92.1  | [Grammar as a Foreign Language](https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf) |
+| Self-trained parser (McClosky et al., 2006) | 92.1 | [Effective Self-Training for Parsing](https://pdfs.semanticscholar.org/6f0f/64f0dab74295e5eb139c160ed79ff262558a.pdf) |
+
+[Go back to the README](README.md)
diff --git a/coreference_resolution.md b/coreference_resolution.md
@@ -0,0 +1,30 @@
+## Coreference resolution
+
+Coreference resolution is the task of clustering mentions in text that refer to the same underlying real world entities.
+
+Example:
+
+```
+               +-----------+
+               |           |
+I voted for Obama because he was most aligned with my values", she said.
+ |                                                 |            |
+ +-------------------------------------------------+------------+
+```
+
+"I", "my", and "she" belong to the same cluster and "Obama" and "he" belong to the same cluster.
+
+### CoNLL 2012
+
+Experiments are conducted on the data of the [CoNLL-2012 shared task](http://www.aclweb.org/anthology/W12-4501), which
+uses OntoNotes coreference annotations. Papers
+report the precision, recall, and F1 of the MUC, B3, and CEAFφ4 metrics using the official
+CoNLL-2012 evaluation scripts. The main evaluation metric is the average F1 of the three metrics.
+
+| Model           | Avg F1 |  Paper / Source |
+| ------------- | :-----:| --- |
+| (Lee et al., 2017)+ELMo (Peters et al., 2018)+coarse-to-fine & second-order inference (Lee et al., 2018) | 73.0 | [Higher-order Coreference Resolution with Coarse-to-fine Inference](http://aclweb.org/anthology/N18-2108) |
+| (Lee et al., 2017)+ELMo (Peters et al., 2018) | 70.4 | [Deep contextualized word representatIions](https://arxiv.org/abs/1802.05365) |
+| Lee et al. (2017) | 67.2 | [End-to-end Neural Coreference Resolution](https://arxiv.org/abs/1707.07045) |
+
+[Go back to the README](README.md)
diff --git a/dependency_parsing.md b/dependency_parsing.md
@@ -0,0 +1,41 @@
+## Dependency parsing
+
+Dependency parsing is the task of extracting a dependency parse of a sentence that represents its grammatical
+structure and defines the relationships between "head" words and words, which modify those heads.
+
+Example:
+
+```
+     root
+      |
+      | +-------dobj---------+
+      | |                    |
+nsubj | |   +------det-----+ | +-----nmod------+
++--+  | |   |              | | |               |
+|  |  | |   |      +-nmod-+| | |      +-case-+ |
++  |  + |   +      +      || + |      +      | |
+I  prefer  the  morning   flight  through  Denver
+```
+
+Relations among the words are illustrated above the sentence with directed, labeled
+arcs from heads to dependents (+ indicates the dependent).
+
+### Penn Treebank&mdash;dependency parsing
+
+Models are evaluated on the [Stanford Dependency](https://nlp.stanford.edu/software/dependencies_manual.pdf)
+conversion of the Penn Treebank with predicted POS-tags. Punctuation symbols
+are excluded from the evaluation. Evaluation metrics are unlabeled attachment score (UAS) and
+labeled attachment score (LAS).
+
+| Model           | UAS | LAS | Paper / Source |
+| ------------- | :-----:| :-----:| --- |
+| Stack-only RNNG (Kuncoro et al., 2017) | 95.8 | 94.6 | [What Do Recurrent Neural Network Grammars Learn About Syntax?](https://arxiv.org/abs/1611.05774) |
+| Semi-supervised LSTM-LM (Choe and Charniak, 2016) | 95.9 | 94.1 | [Parsing as Language Modeling](http://www.aclweb.org/anthology/D16-1257) | 
+| Deep Biaffine (Dozat and Manning, 2017) | 95.66 | 94.03 | [Deep Biaffine Attention for Neural Dependency Parsing](https://arxiv.org/abs/1611.01734) | 
+| Andor et al. (2016) | 94.61 | 92.79 | [Globally Normalized Transition-Based Neural Networks](https://www.aclweb.org/anthology/P16-1231) |
+| Distilled neural FOG (Kuncoro et al., 2016) | 94.26 | 92.06 | [Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser](https://arxiv.org/abs/1609.07561) | 
+| Weiss et al. (2015) | 94.0 | 92.0 | [Structured Training for Neural Network Transition-Based Parsing](http://anthology.aclweb.org/P/P15/P15-1032.pdf) |
+| Arc-hybrid (Ballesteros et al., 2016) | 93.56 | 91.42 | [Training with Exploration Improves a Greedy Stack-LSTM Parser](https://arxiv.org/abs/1603.03793) |
+| BIST parser (Kiperwasser and Goldberg, 2016) | 93.2 | 91.2 | [Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations](https://aclweb.org/anthology/Q16-1023) |
+
+[Go back to the README](README.md)
diff --git a/dialog.md b/dialog.md
@@ -0,0 +1,20 @@
+## Dialog
+
+Dialogue is notoriously hard to evaluate. Past approaches have used human evaluation.
+
+### Second dialog state tracking challenge
+
+For goal-oriented dialogue, the dataset of the [second dialog state tracking challenge](http://www.aclweb.org/anthology/W14-4337)
+(DSTC2) is a common evaluation dataset. Dialogue state tacking consists of determining
+at each turn of a dialog the full representation of what the user wants at that point 
+in the dialog, which contains a goal constraint, a set of requested slots, and
+the user's dialog act. The DSTC2 focuses on the restaurant search domain. Models are
+evaluated based on accuracy on both individual and joint slot tracking.
+
+| Model           | Area  |  Food  |  Price  |  Joint  |  Paper / Source |
+| ------------- | :-----:| :-----:| :-----:| :-----:| --- |
+| Liu et al. (2018) | 90 | 84 | 92 | 72 | [Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems](https://arxiv.org/abs/1804.06512) |
+| Neural belief tracker (Mrkšić et al., 2017) | 90 | 84 | 94 | 72 | [Neural Belief Tracker: Data-Driven Dialogue State Tracking](https://arxiv.org/abs/1606.03777) |
+| RNN (Henderson et al., 2014) |92 | 86 | 86 | 69 | [Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised gate](http://svr-ftp.eng.cam.ac.uk/~sjy/papers/htyo14.pdf) | 
+
+[Go back to the README](README.md)
diff --git a/domain_adaptation.md b/domain_adaptation.md
@@ -0,0 +1,20 @@
+## Domain adaptation
+
+### Multi-Domain Sentiment Dataset
+
+The [Multi-Domain Sentiment Dataset](https://www.cs.jhu.edu/~mdredze/datasets/sentiment/) is a common
+evaluation dataset for domain adaptation for sentiment analysis. It contains product reviews from
+Amazon.com from different product categories, which are treated as distinct domains.
+Reviews contain star ratings (1 to 5 stars) that are generally converted into binary labels. Models are
+typically evaluated on a target domain that is different from the source domain they were trained on, while only
+having access to unlabeled examples of the target domain (unsupervised domain adaptation). The evaluation
+metric is accuracy and scores are averaged across each domain.
+
+| Model           | DVD | Books | Electronics | Kitchen | Average |  Paper / Source |
+| ------------- | :-----:| :-----:| :-----:| :-----:| :-----:| --- |
+| Multi-task tri-training (Ruder and Plank, 2018) | 78.14 | 74.86 | 81.45 | 82.14 | 79.15 | [Strong Baselines for Neural Semi-supervised Learning under Domain Shift](https://arxiv.org/abs/1804.09530) |
+| Asymmetric tri-training (Saito et al., 2017) | 76.17 | 72.97 | 80.47 | 83.97 | 78.39 | [Asymmetric Tri-training for Unsupervised Domain Adaptation](https://arxiv.org/abs/1702.08400) |
+| VFAE (Louizos et al., 2015) | 76.57 | 73.40 | 80.53 | 82.93 | 78.36 | [The Variational Fair Autoencoder](https://arxiv.org/abs/1511.00830) |
+| DANN (Ganin et al., 2016) | 75.40 | 71.43 | 77.67 | 80.53 | 76.26 | [Domain-Adversarial Training of Neural Networks](https://arxiv.org/abs/1505.07818) |
+
+[Go back to the README](README.md)
diff --git a/language_modeling.md b/language_modeling.md
@@ -0,0 +1,39 @@
+## Language modeling
+
+Language modeling is the task of predicting the next word in a document. * indicates models using dynamic evaluation.
+
+### Penn Treebank&mdash;language modeling
+
+A common evaluation dataset for language modeling ist the Penn Treebank,
+as pre-processed by [Mikolov et al. (2010)](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf).
+The dataset consists of 929k training words, 73k validation words, and
+82k test words. As part of the pre-processing, words were lower-cased, numbers
+were replaced with N, newlines were replaced with <eos>,
+and all other punctuation was removed. The vocabulary is
+the most frequent 10k words with the rest of the tokens replaced by an <unk> token.
+Models are evaluated based on perplexity, which is the average
+per-word log-probability (lower is better).
+
+| Model           | Validation perplexity | Test perplexity |  Paper / Source |
+| ------------- | :-----:| :-----:| --- |
+| AWD-LSTM-MoS + dynamic eval (Yang et al., 2018)* | 48.33 | 47.69 | [Breaking the Softmax Bottleneck: A High-Rank RNN Language Model](https://arxiv.org/abs/1711.03953) |
+| AWD-LSTM + dynamic eval (Krause et al., 2017)* | 51.6 | 51.1 | [Dynamic Evaluation of Neural Sequence Models](https://arxiv.org/abs/1709.07432) |
+| AWD-LSTM + continuous cache pointer (Merity et al., 2017)* | 53.9 | 52.8 | [Regularizing and Optimizing LSTM Language Models](https://arxiv.org/abs/1708.02182) | 
+| AWD-LSTM-MoS (Yang et al., 2018) | 56.54 | 54.44 | [Breaking the Softmax Bottleneck: A High-Rank RNN Language Model](https://arxiv.org/abs/1711.03953) |
+| AWD-LSTM (Merity et al., 2017) | 60.0 | 57.3 | [Regularizing and Optimizing LSTM Language Models](https://arxiv.org/abs/1708.02182) | 
+
+### WikiText-2
+
+[WikiText-2](https://arxiv.org/abs/1609.07843) has been proposed as a more realistic
+benchmark for language modeling than the pre-processed Penn Treebank. WikiText-2
+consists of around 2 million words extracted from Wikipedia articles.
+
+| Model           | Validation perplexity | Test perplexity |  Paper / Source |
+| ------------- | :-----:| :-----:| --- |
+| AWD-LSTM-MoS + dynamic eval (Yang et al., 2018)* | 42.41 | 40.68 | [Breaking the Softmax Bottleneck: A High-Rank RNN Language Model](https://arxiv.org/abs/1711.03953) |
+| AWD-LSTM + dynamic eval (Krause et al., 2017)* | 46.4 | 44.3 | [Dynamic Evaluation of Neural Sequence Models](https://arxiv.org/abs/1709.07432) |
+| AWD-LSTM + continuous cache pointer (Merity et al., 2017)* | 53.8 | 52.0 | [Regularizing and Optimizing LSTM Language Models](https://arxiv.org/abs/1708.02182) | 
+| AWD-LSTM-MoS (Yang et al., 2018) | 63.88 | 61.45 | [Breaking the Softmax Bottleneck: A High-Rank RNN Language Model](https://arxiv.org/abs/1711.03953) |
+| AWD-LSTM (Merity et al., 2017) | 68.6 | 65.8 | [Regularizing and Optimizing LSTM Language Models](https://arxiv.org/abs/1708.02182) | 
+
+[Go back to the README](README.md)
diff --git a/machine_translation.md b/machine_translation.md
@@ -0,0 +1,34 @@
+## Machine translation
+
+Machine translation is the task of translating a sentence in a source language to a different target language. 
+
+Results with a * indicate that the mean test score over the the best window based on average dev-set BLEU score over 
+21 consecutive evaluations is reported as in [Chen et al. (2018)](https://arxiv.org/abs/1804.09849).
+
+### WMT 2014 EN-DE
+
+Models are evaluated on the English-German dataset of the Ninth Workshop on Statistical Machine Translation (WMT 2014) based
+on BLEU.
+
+| Model           | BLEU  |  Paper / Source |
+| ------------- | :-----:| --- |
+| RNMT+ (Chen et al., 2018) | 28.5* | [The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation](https://arxiv.org/abs/1804.09849) |
+| Transformer Big (Vaswani et al., 2017) | 28.4 | [Attention Is All You Need](https://arxiv.org/abs/1706.03762) |
+| Transformer Base (Vaswani et al., 2017) | 27.3 | [Attention Is All You Need](https://arxiv.org/abs/1706.03762) |
+| MoE (Shazeer et al., 2017) | 26.03 | [Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer](https://arxiv.org/abs/1701.06538) |
+| ConvS2S (Gehring et al., 2017) | 25.16 | [Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122) | 
+
+### WMT 2014 EN-FR
+
+Similarly, models are evaluated on the English-French dataset of the Ninth Workshop on Statistical Machine Translation (WMT 2014) based
+on BLEU.
+
+| Model           | BLEU  |  Paper / Source |
+| ------------- | :-----:| --- |
+| RNMT+ (Chen et al., 2018) | 41.0* | [The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation](https://arxiv.org/abs/1804.09849) |
+| Transformer Big (Vaswani et al., 2017) | 41.0 | [Attention Is All You Need](https://arxiv.org/abs/1706.03762) |
+| MoE (Shazeer et al., 2017) | 40.56 | [Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer](https://arxiv.org/abs/1701.06538) |
+| ConvS2S (Gehring et al., 2017) | 40.46 | [Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122) | 
+| Transformer Base (Vaswani et al., 2017) | 38.1 | [Attention Is All You Need](https://arxiv.org/abs/1706.03762) |
+
+[Go back to the README](README.md)
diff --git a/multi-task_learning.md b/multi-task_learning.md
@@ -0,0 +1,15 @@
+## Multi-task learning
+
+Multi-task learning aims to learn multiple different tasks simultaneously while maximizing
+performance on one or all of the tasks. 
+
+### GLUE
+
+The [General Language Understanding Evaluation benchmark](https://arxiv.org/abs/1804.07461) (GLUE)
+is a tool for evaluating and analyzing the performance of models across a diverse
+range of existing natural language understanding tasks. Models are evaluated based on their
+average accuracy across all tasks.
+
+The state-of-the-art results can be seen on the public [GLUE leaderboard](https://gluebenchmark.com/leaderboard).
+
+[Go back to the README](README.md)
diff --git a/multimodal.md b/multimodal.md
@@ -0,0 +1,31 @@
+## Multimodal Sentiment Analysis
+
+### MOSI
+The MOSI dataset ([Zadeh et al., 2016](https://arxiv.org/pdf/1606.06259.pdf)) is a dataset rich in sentimental expressions where 93 people review topics in English. The videos are segmented with each segments sentiment label scored between +3 (strong positive) to -3 (strong negative)  by  5  annotators.
+
+| Model           | Accuracy  |  Paper / Source |
+| ------------- | :-----:| --- |
+| bc-LSTM (Poria et al., 2017) | 80.3%  | [Context-Dependent Sentiment Analysis in User-Generated Videos](http://sentic.net/context-dependent-sentiment-analysis-in-user-generated-videos.pdf) |
+| MARN (Zadeh et al., 2018) | 77.1%  | [Multi-attention Recurrent Network for Human Communication Comprehension](https://arxiv.org/pdf/1802.00923.pdf) |
+
+## Multimodal Emotion Recognition 
+
+### IEMOCAP
+The  IEMOCAP ([Busso  et  al., 2008](https://link.springer.com/article/10.1007/s10579-008-9076-6)) contains the acts of 10 speakers in a two-way conversation segmented into utterances. The medium of the conversations in all the videos is English. The database contains the following categorical labels: anger, happiness, sadness, neutral, excitement, frustration, fear, surprise,  and other.
+
+**Monologue:**
+
+| Model           | Accuracy  |  Paper / Source |
+| ------------- | :-----:| --- |
+| CHFusion (Poria et al., 2017) | 76.5%  | [Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling](https://arxiv.org/pdf/1806.06228.pdf) |
+| bc-LSTM (Poria et al., 2017) | 74.10%  | [Context-Dependent Sentiment Analysis in User-Generated Videos](http://sentic.net/context-dependent-sentiment-analysis-in-user-generated-videos.pdf) |
+
+**Conversational:**
+Conversational setting enables the models to capture emotions expressed by the speakers in a conversation. Inter speaker dependencies are considered in this setting.
+
+| Model           |  Weighted Accuracy (WAA)  |  Paper / Source |
+| ------------- | :-----:| --- |
+| CMN (Hazarika et al., 2018) |  77.62%  | [Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos](http://aclweb.org/anthology/N18-1193) |
+| Memn2n | 75.08 | [Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos](http://aclweb.org/anthology/N18-1193)
+
+[Go back to the README](README.md)