Adapted headlines to highlight high-level tasks and subtasks

DanSanz · Jun 24, 2018 · 34b01fe · 34b01fe
1 parent 1979ba0
commit 34b01fe
Show file tree

Hide file tree

Showing 21 changed files with 75 additions and 58 deletions.
diff --git a/ccg_supertagging.md b/ccg_supertagging.md
@@ -1,4 +1,4 @@
-## CCG supertagging
+# CCG supertagging
 
 Combinatory Categorical Grammar (CCG; [Steedman, 2000](http://www.citeulike.org/group/14833/article/8971002)) is a
 highly lexicalized formalism. The standard parsing model of [Clark and Curran (2007)](https://www.mitpressjournals.org/doi/abs/10.1162/coli.2007.33.4.493)

diff --git a/chunking.md b/chunking.md
@@ -1,4 +1,4 @@
-## Chunking
+# Chunking
 
 Chunking is a shallow form of parsing that identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.
 
@@ -8,7 +8,7 @@ Example:
 | --- | ---| --- | --- | --- |
 | B-NLP| I-NP | I-NP | I-NP | I-NP |
 
-### Penn Treebank&mdash;chunking
+### Penn Treebank
 
 The [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) is typically used for evaluating chunking.
 Sections 15-18 are used for training, section 19 for development, and and section 20

diff --git a/constituency_parsing.md b/constituency_parsing.md
@@ -1,4 +1,4 @@
-## Constituency parsing
+# Constituency parsing
 
 Consituency parsing aims to extract a constituency-based parse tree from a sentence that 
 represents its syntactic structure according to a [phrase structure grammar](https://en.wikipedia.org/wiki/Phrase_structure_grammar).
@@ -22,7 +22,7 @@ convert the parse tree into a sequence following a depth-first traversal in orde
 be able to apply sequence-to-sequence models to it. The linearized version of the
 above parse tree looks as follows: (S (N) (VP V N)).
 
-### Penn Treebank&mdash;constituency parsing
+### Penn Treebank
 
 The Wall Street Journal section of the [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) is used for 
 evaluating constituency parsers. Section 22 is used for development and Section 23 is used for evaluation.

diff --git a/coreference_resolution.md b/coreference_resolution.md
@@ -1,4 +1,4 @@
-## Coreference resolution
+# Coreference resolution
 
 Coreference resolution is the task of clustering mentions in text that refer to the same underlying real world entities.
 

diff --git a/dependency_parsing.md b/dependency_parsing.md
@@ -1,4 +1,4 @@
-## Dependency parsing
+# Dependency parsing
 
 Dependency parsing is the task of extracting a dependency parse of a sentence that represents its grammatical
 structure and defines the relationships between "head" words and words, which modify those heads.
@@ -20,7 +20,7 @@ I  prefer  the  morning   flight  through  Denver
 Relations among the words are illustrated above the sentence with directed, labeled
 arcs from heads to dependents (+ indicates the dependent).
 
-### Penn Treebank&mdash;dependency parsing
+### Penn Treebank
 
 Models are evaluated on the [Stanford Dependency](https://nlp.stanford.edu/software/dependencies_manual.pdf)
 conversion of the Penn Treebank with predicted POS-tags. Punctuation symbols

diff --git a/dialog.md b/dialog.md
@@ -1,14 +1,17 @@
-## Dialog
+# Dialog
 
 Dialogue is notoriously hard to evaluate. Past approaches have used human evaluation.
 
+## Dialog state tracking
+
+Dialogue state tacking consists of determining at each turn of a dialog the 
+full representation of what the user wants at that point in the dialog, 
+which contains a goal constraint, a set of requested slots, and the user's dialog act. 
+
 ### Second dialog state tracking challenge
 
 For goal-oriented dialogue, the dataset of the [second dialog state tracking challenge](http://www.aclweb.org/anthology/W14-4337)
-(DSTC2) is a common evaluation dataset. Dialogue state tacking consists of determining
-at each turn of a dialog the full representation of what the user wants at that point 
-in the dialog, which contains a goal constraint, a set of requested slots, and
-the user's dialog act. The DSTC2 focuses on the restaurant search domain. Models are
+(DSTC2) is a common evaluation dataset. The DSTC2 focuses on the restaurant search domain. Models are
 evaluated based on accuracy on both individual and joint slot tracking.
 
 | Model           | Area  |  Food  |  Price  |  Joint  |  Paper / Source |

diff --git a/domain_adaptation.md b/domain_adaptation.md
@@ -1,4 +1,6 @@
-## Domain adaptation
+# Domain adaptation
+
+## Sentiment analysis 
 
 ### Multi-Domain Sentiment Dataset
 

diff --git a/language_modeling.md b/language_modeling.md
@@ -1,8 +1,8 @@
-## Language modeling
+# Language modeling
 
 Language modeling is the task of predicting the next word in a document. * indicates models using dynamic evaluation.
 
-### Penn Treebank&mdash;language modeling
+### Penn Treebank
 
 A common evaluation dataset for language modeling ist the Penn Treebank,
 as pre-processed by [Mikolov et al. (2010)](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf).

diff --git a/machine_translation.md b/machine_translation.md
@@ -1,4 +1,4 @@
-## Machine translation
+# Machine translation
 
 Machine translation is the task of translating a sentence in a source language to a different target language. 
 

diff --git a/multi-task_learning.md b/multi-task_learning.md
@@ -1,4 +1,4 @@
-## Multi-task learning
+# Multi-task learning
 
 Multi-task learning aims to learn multiple different tasks simultaneously while maximizing
 performance on one or all of the tasks. 

diff --git a/multimodal.md b/multimodal.md
@@ -1,16 +1,9 @@
-## Multimodal Sentiment Analysis
-
-### MOSI
-The MOSI dataset ([Zadeh et al., 2016](https://arxiv.org/pdf/1606.06259.pdf)) is a dataset rich in sentimental expressions where 93 people review topics in English. The videos are segmented with each segments sentiment label scored between +3 (strong positive) to -3 (strong negative)  by  5  annotators.
-
-| Model           | Accuracy  |  Paper / Source |
-| ------------- | :-----:| --- |
-| bc-LSTM (Poria et al., 2017) | 80.3%  | [Context-Dependent Sentiment Analysis in User-Generated Videos](http://sentic.net/context-dependent-sentiment-analysis-in-user-generated-videos.pdf) |
-| MARN (Zadeh et al., 2018) | 77.1%  | [Multi-attention Recurrent Network for Human Communication Comprehension](https://arxiv.org/pdf/1802.00923.pdf) |
+# Multimodal
 
 ## Multimodal Emotion Recognition 
 
 ### IEMOCAP
+
 The  IEMOCAP ([Busso  et  al., 2008](https://link.springer.com/article/10.1007/s10579-008-9076-6)) contains the acts of 10 speakers in a two-way conversation segmented into utterances. The medium of the conversations in all the videos is English. The database contains the following categorical labels: anger, happiness, sadness, neutral, excitement, frustration, fear, surprise,  and other.
 
 **Monologue:**
@@ -28,4 +21,14 @@ Conversational setting enables the models to capture emotions expressed by the s
 | CMN (Hazarika et al., 2018) |  77.62%  | [Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos](http://aclweb.org/anthology/N18-1193) |
 | Memn2n | 75.08 | [Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos](http://aclweb.org/anthology/N18-1193)
 
+## Multimodal Sentiment Analysis
+
+### MOSI
+The MOSI dataset ([Zadeh et al., 2016](https://arxiv.org/pdf/1606.06259.pdf)) is a dataset rich in sentimental expressions where 93 people review topics in English. The videos are segmented with each segments sentiment label scored between +3 (strong positive) to -3 (strong negative)  by  5  annotators.
+
+| Model           | Accuracy  |  Paper / Source |
+| ------------- | :-----:| --- |
+| bc-LSTM (Poria et al., 2017) | 80.3%  | [Context-Dependent Sentiment Analysis in User-Generated Videos](http://sentic.net/context-dependent-sentiment-analysis-in-user-generated-videos.pdf) |
+| MARN (Zadeh et al., 2018) | 77.1%  | [Multi-attention Recurrent Network for Human Communication Comprehension](https://arxiv.org/pdf/1802.00923.pdf) |
+
 [Go back to the README](README.md)
diff --git a/named_entity_recognition.md b/named_entity_recognition.md
@@ -1,4 +1,4 @@
-## Named entity recognition
+# Named entity recognition
 
 Named entity recognition (NER) is the task of tagging entities in text with their corresponding type.
 Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities.

diff --git a/natural_language_inference.md b/natural_language_inference.md
@@ -1,4 +1,4 @@
-## Natural language inference
+# Natural language inference
 
 Natural language inference is the task of determining whether a "hypothesis" is 
 true (entailment), false (contradiction), or undetermined (neutral) given a "premise".

diff --git a/part-of-speech_tagging.md b/part-of-speech_tagging.md
@@ -1,4 +1,4 @@
-## Part-of-speech tagging
+# Part-of-speech tagging
 
 Part-of-speech tagging (POS tagging) is the task of tagging a word in a text with its part of speech.
 A part of speech is a category of words with similar grammatical properties. Common English
@@ -22,7 +22,7 @@ Models are typically evaluated based on the average test accuracy across 28 lang
 | Bi-LSTM (Plank et al., 2016) | 96.40 | [Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss](https://arxiv.org/abs/1604.05529) | 
 | Joint Bi-LSTM (Nguyen et al., 2017) | 95.55 | [A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing](https://arxiv.org/abs/1705.05952) |
 
-### Penn Treebank&mdash;POS tagging
+### Penn Treebank
 
 A standard dataset for POS tagging is the Wall Street Journal (WSJ) portion of the Penn Treebank, containing 45 
 different POS tags. Sections 0-18 are used for training, sections 19-21 for development, and sections 

diff --git a/question_answering.md b/question_answering.md
@@ -1,9 +1,6 @@
-## Question answering / reading comprehension
+# Question answering
 
-Question answering is the task of answering a question. Most current datasets
-frame this task as reading comprehension where the question is about a paragraph
-or document and the answer often is a span in the document. The Machine Reading group
-at UCL also provides an [overview of reading comprehension tasks](https://uclmr.github.io/ai4exams/data.html).
+Question answering is the task of answering a question.
 
 ### ARC
 
@@ -15,7 +12,13 @@ based on accuracy.
 
 A public leaderboard is available on the [ARC website](http://data.allenai.org/arc/).
 
-### CNN / Daily Mail&mdash;reading comprehension
+## Reading comprehension
+
+Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph
+or document and the answer often is a span in the document. The Machine Reading group
+at UCL also provides an [overview of reading comprehension tasks](https://uclmr.github.io/ai4exams/data.html).
+
+### CNN / Daily Mail
 
 The [CNN / Daily Mail dataset](https://arxiv.org/abs/1506.03340) is a Cloze-style reading comprehension dataset
 created from CNN and Daily Mail news articles using heuristics. [Close-style](https://en.wikipedia.org/wiki/Cloze_test)

diff --git a/semantic_parsing.md b/semantic_parsing.md
@@ -1,9 +1,11 @@
-## Semantic parsing
+# Semantic parsing
 
 Semantic parsing is the task of translating natural language into a formal meaning
 representation on which a machine can act. Representations may be an executable language
 such as SQL or more abstract representations such as [Abstract Meaning Representation (AMR)](https://en.wikipedia.org/wiki/Abstract_Meaning_Representation).
 
+## SQL parsing
+
 ### WikiSQL
 
 The [WikiSQL dataset](https://arxiv.org/abs/1709.00103) consists of 87,673 

diff --git a/semantic_role_labeling.md b/semantic_role_labeling.md
@@ -1,4 +1,4 @@
-## Semantic role labeling
+# Semantic role labeling
 
 Semantic role labeling aims to model the predicate-argument structure of a sentence
 and is often described as answering "Who did what to whom". BIO notation is typically
@@ -10,7 +10,7 @@ Example:
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
 | B-ARG1 | I-ARG1 | O |  O  |  O  |   V  | B-ARG2 | I-ARG2 | B-ARG3 | I-ARG3 | I-ARG3 |    
 
-### OntoNotes&mdash;semantic role labeling
+### OntoNotes
 
 Models are typically evaluated on the [OntoNotes benchmark](http://www.aclweb.org/anthology/W13-3516) based on F1.
 

diff --git a/semantic_textual_similarity.md b/semantic_textual_similarity.md
@@ -1,4 +1,4 @@
-## Semantic textual similarity
+# Semantic textual similarity
 
 Semantic textual similarity deals with determining how similar two pieces of texts are.
 This can take the form of assigning a score from 1 to 5. Related tasks are paraphrase or duplicate identification.
@@ -24,6 +24,8 @@ The data can be downloaded from [here](https://github.com/facebookresearch/SentE
 | GenSen (Subramanian et al., 2018) | 78.6/84.4 | 0.888 | 87.8 | 78.9/78.6 | [Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning](https://arxiv.org/abs/1804.00079) | |
 | InferSent (Conneau et al., 2017) | 76.2/83.1 | 0.884 | 86.3 | 75.8/75.5 | [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://arxiv.org/abs/1705.02364) |
 
+## Paraphrase identification
+
 ### Quora Question Pairs
 
 The [Quora Question Pairs dataset](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs)

diff --git a/sentiment_analysis.md b/sentiment_analysis.md
@@ -1,4 +1,4 @@
-## Sentiment analysis
+# Sentiment analysis
 
 Sentiment analysis is the task of classifying the polarity of a given text.
 
@@ -15,19 +15,6 @@ negative. Models are evaluated based on accuracy.
 | Virtual adversarial training (Miyato et al., 2016) | 94.1 | [Adversarial Training Methods for Semi-Supervised Text Classification](https://arxiv.org/abs/1605.07725) |
 | BCN+Char+CoVe (McCann et al., 2017) | 91.8 | [Learned in Translation: Contextualized Word Vectors](https://arxiv.org/abs/1708.00107) |
 
-### Sentihood
-
-[Sentihood](http://www.aclweb.org/anthology/C16-1146) is a dataset for targeted aspect-based sentiment analysis (TABSA), which aims
-to identify fine-grained polarity towards a specific aspect. The dataset consists of 5,215 sentences,
-3,862 of which contain a single target, and the remainder multiple targets. F1 is used as evaluation metric
-for aspect detection and accuracy as evaluation metric for sentiment analysis.
-
-| Model           | Aspect  | Sentiment |  Paper / Source |
-| ------------- | :-----:| :-----:| --- |
-| Liu et al. (2018) | 78.5 | 91.0 | [Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-based Sentiment Analysis](http://aclweb.org/anthology/N18-2045) |
-| SenticLSTM (Ma et al., 2018) | 78.2 | 89.3 | [Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM](http://sentic.net/sentic-lstm.pdf) | 
-| LSTM-LOC (Saeidi et al., 2016) | 69.3 | 81.9 | [Sentihood: Targeted aspect based sentiment analysis dataset for urban neighbourhoods](http://www.aclweb.org/anthology/C16-1146) |
-
 ### SST
 
 The [Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/index.html) 
@@ -74,3 +61,18 @@ Binary classification:
 | DPCNN (Johnson and Zhang, 2017) | 2.64 | [Deep Pyramid Convolutional Neural Networks for Text Categorization](http://aclweb.org/anthology/P17-1052) |
 | CNN (Johnson and Zhang, 2016) | 2.90 | [Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings](https://arxiv.org/abs/1602.02373) |
 | Char-level CNN (Zhang et al., 2015) | 4.88 | [Character-level Convolutional Networks for Text Classification](https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf) |
+
+## Aspect-based sentiment analysis
+
+### Sentihood
+
+[Sentihood](http://www.aclweb.org/anthology/C16-1146) is a dataset for targeted aspect-based sentiment analysis (TABSA), which aims
+to identify fine-grained polarity towards a specific aspect. The dataset consists of 5,215 sentences,
+3,862 of which contain a single target, and the remainder multiple targets. F1 is used as evaluation metric
+for aspect detection and accuracy as evaluation metric for sentiment analysis.
+
+| Model           | Aspect  | Sentiment |  Paper / Source |
+| ------------- | :-----:| :-----:| --- |
+| Liu et al. (2018) | 78.5 | 91.0 | [Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-based Sentiment Analysis](http://aclweb.org/anthology/N18-2045) |
+| SenticLSTM (Ma et al., 2018) | 78.2 | 89.3 | [Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM](http://sentic.net/sentic-lstm.pdf) | 
+| LSTM-LOC (Saeidi et al., 2016) | 69.3 | 81.9 | [Sentihood: Targeted aspect based sentiment analysis dataset for urban neighbourhoods](http://www.aclweb.org/anthology/C16-1146) |
diff --git a/summarization.md b/summarization.md
@@ -1,9 +1,9 @@
-## Summarization
+# Summarization
 
 Summarization is the task of producing a shorter version of a document that preserves most of the
 original document's meaning.
 
-### CNN / Daily Mail&mdash;summarization
+### CNN / Daily Mail
 
 The [CNN / Daily Mail dataset](https://arxiv.org/abs/1506.03340) as processed by 
 [Nallapati et al. (2016)](http://www.aclweb.org/anthology/K16-1028) has been used

diff --git a/text_classification.md b/text_classification.md
@@ -1,4 +1,4 @@
-## Text classification
+# Text classification
 
 Text classification is the task of assigning a sentence or document an appropriate category.
 The categories depend on the chosen dataset and can range from topics.