Skip to content

Commit

Permalink
Adapted headlines to highlight high-level tasks and subtasks
Browse files Browse the repository at this point in the history
  • Loading branch information
sebastianruder committed Jun 24, 2018
1 parent 1979ba0 commit 34b01fe
Show file tree
Hide file tree
Showing 21 changed files with 75 additions and 58 deletions.
2 changes: 1 addition & 1 deletion ccg_supertagging.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## CCG supertagging
# CCG supertagging

Combinatory Categorical Grammar (CCG; [Steedman, 2000](http://www.citeulike.org/group/14833/article/8971002)) is a
highly lexicalized formalism. The standard parsing model of [Clark and Curran (2007)](https://www.mitpressjournals.org/doi/abs/10.1162/coli.2007.33.4.493)
Expand Down
4 changes: 2 additions & 2 deletions chunking.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Chunking
# Chunking

Chunking is a shallow form of parsing that identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.

Expand All @@ -8,7 +8,7 @@ Example:
| --- | ---| --- | --- | --- |
| B-NLP| I-NP | I-NP | I-NP | I-NP |

### Penn Treebank—chunking
### Penn Treebank

The [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) is typically used for evaluating chunking.
Sections 15-18 are used for training, section 19 for development, and and section 20
Expand Down
4 changes: 2 additions & 2 deletions constituency_parsing.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Constituency parsing
# Constituency parsing

Consituency parsing aims to extract a constituency-based parse tree from a sentence that
represents its syntactic structure according to a [phrase structure grammar](https://en.wikipedia.org/wiki/Phrase_structure_grammar).
Expand All @@ -22,7 +22,7 @@ convert the parse tree into a sequence following a depth-first traversal in orde
be able to apply sequence-to-sequence models to it. The linearized version of the
above parse tree looks as follows: (S (N) (VP V N)).

### Penn Treebank—constituency parsing
### Penn Treebank

The Wall Street Journal section of the [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) is used for
evaluating constituency parsers. Section 22 is used for development and Section 23 is used for evaluation.
Expand Down
2 changes: 1 addition & 1 deletion coreference_resolution.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Coreference resolution
# Coreference resolution

Coreference resolution is the task of clustering mentions in text that refer to the same underlying real world entities.

Expand Down
4 changes: 2 additions & 2 deletions dependency_parsing.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Dependency parsing
# Dependency parsing

Dependency parsing is the task of extracting a dependency parse of a sentence that represents its grammatical
structure and defines the relationships between "head" words and words, which modify those heads.
Expand All @@ -20,7 +20,7 @@ I prefer the morning flight through Denver
Relations among the words are illustrated above the sentence with directed, labeled
arcs from heads to dependents (+ indicates the dependent).

### Penn Treebank—dependency parsing
### Penn Treebank

Models are evaluated on the [Stanford Dependency](https://nlp.stanford.edu/software/dependencies_manual.pdf)
conversion of the Penn Treebank with predicted POS-tags. Punctuation symbols
Expand Down
13 changes: 8 additions & 5 deletions dialog.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
## Dialog
# Dialog

Dialogue is notoriously hard to evaluate. Past approaches have used human evaluation.

## Dialog state tracking

Dialogue state tacking consists of determining at each turn of a dialog the
full representation of what the user wants at that point in the dialog,
which contains a goal constraint, a set of requested slots, and the user's dialog act.

### Second dialog state tracking challenge

For goal-oriented dialogue, the dataset of the [second dialog state tracking challenge](http://www.aclweb.org/anthology/W14-4337)
(DSTC2) is a common evaluation dataset. Dialogue state tacking consists of determining
at each turn of a dialog the full representation of what the user wants at that point
in the dialog, which contains a goal constraint, a set of requested slots, and
the user's dialog act. The DSTC2 focuses on the restaurant search domain. Models are
(DSTC2) is a common evaluation dataset. The DSTC2 focuses on the restaurant search domain. Models are
evaluated based on accuracy on both individual and joint slot tracking.

| Model | Area | Food | Price | Joint | Paper / Source |
Expand Down
4 changes: 3 additions & 1 deletion domain_adaptation.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
## Domain adaptation
# Domain adaptation

## Sentiment analysis

### Multi-Domain Sentiment Dataset

Expand Down
4 changes: 2 additions & 2 deletions language_modeling.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
## Language modeling
# Language modeling

Language modeling is the task of predicting the next word in a document. * indicates models using dynamic evaluation.

### Penn Treebank—language modeling
### Penn Treebank

A common evaluation dataset for language modeling ist the Penn Treebank,
as pre-processed by [Mikolov et al. (2010)](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf).
Expand Down
2 changes: 1 addition & 1 deletion machine_translation.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Machine translation
# Machine translation

Machine translation is the task of translating a sentence in a source language to a different target language.

Expand Down
2 changes: 1 addition & 1 deletion multi-task_learning.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Multi-task learning
# Multi-task learning

Multi-task learning aims to learn multiple different tasks simultaneously while maximizing
performance on one or all of the tasks.
Expand Down
21 changes: 12 additions & 9 deletions multimodal.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,9 @@
## Multimodal Sentiment Analysis

### MOSI
The MOSI dataset ([Zadeh et al., 2016](https://arxiv.org/pdf/1606.06259.pdf)) is a dataset rich in sentimental expressions where 93 people review topics in English. The videos are segmented with each segments sentiment label scored between +3 (strong positive) to -3 (strong negative) by 5 annotators.

| Model | Accuracy | Paper / Source |
| ------------- | :-----:| --- |
| bc-LSTM (Poria et al., 2017) | 80.3% | [Context-Dependent Sentiment Analysis in User-Generated Videos](http://sentic.net/context-dependent-sentiment-analysis-in-user-generated-videos.pdf) |
| MARN (Zadeh et al., 2018) | 77.1% | [Multi-attention Recurrent Network for Human Communication Comprehension](https://arxiv.org/pdf/1802.00923.pdf) |
# Multimodal

## Multimodal Emotion Recognition

### IEMOCAP

The IEMOCAP ([Busso et al., 2008](https://link.springer.com/article/10.1007/s10579-008-9076-6)) contains the acts of 10 speakers in a two-way conversation segmented into utterances. The medium of the conversations in all the videos is English. The database contains the following categorical labels: anger, happiness, sadness, neutral, excitement, frustration, fear, surprise, and other.

**Monologue:**
Expand All @@ -28,4 +21,14 @@ Conversational setting enables the models to capture emotions expressed by the s
| CMN (Hazarika et al., 2018) | 77.62% | [Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos](http://aclweb.org/anthology/N18-1193) |
| Memn2n | 75.08 | [Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos](http://aclweb.org/anthology/N18-1193)

## Multimodal Sentiment Analysis

### MOSI
The MOSI dataset ([Zadeh et al., 2016](https://arxiv.org/pdf/1606.06259.pdf)) is a dataset rich in sentimental expressions where 93 people review topics in English. The videos are segmented with each segments sentiment label scored between +3 (strong positive) to -3 (strong negative) by 5 annotators.

| Model | Accuracy | Paper / Source |
| ------------- | :-----:| --- |
| bc-LSTM (Poria et al., 2017) | 80.3% | [Context-Dependent Sentiment Analysis in User-Generated Videos](http://sentic.net/context-dependent-sentiment-analysis-in-user-generated-videos.pdf) |
| MARN (Zadeh et al., 2018) | 77.1% | [Multi-attention Recurrent Network for Human Communication Comprehension](https://arxiv.org/pdf/1802.00923.pdf) |

[Go back to the README](README.md)
2 changes: 1 addition & 1 deletion named_entity_recognition.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Named entity recognition
# Named entity recognition

Named entity recognition (NER) is the task of tagging entities in text with their corresponding type.
Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities.
Expand Down
2 changes: 1 addition & 1 deletion natural_language_inference.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Natural language inference
# Natural language inference

Natural language inference is the task of determining whether a "hypothesis" is
true (entailment), false (contradiction), or undetermined (neutral) given a "premise".
Expand Down
4 changes: 2 additions & 2 deletions part-of-speech_tagging.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Part-of-speech tagging
# Part-of-speech tagging

Part-of-speech tagging (POS tagging) is the task of tagging a word in a text with its part of speech.
A part of speech is a category of words with similar grammatical properties. Common English
Expand All @@ -22,7 +22,7 @@ Models are typically evaluated based on the average test accuracy across 28 lang
| Bi-LSTM (Plank et al., 2016) | 96.40 | [Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss](https://arxiv.org/abs/1604.05529) |
| Joint Bi-LSTM (Nguyen et al., 2017) | 95.55 | [A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing](https://arxiv.org/abs/1705.05952) |

### Penn Treebank—POS tagging
### Penn Treebank

A standard dataset for POS tagging is the Wall Street Journal (WSJ) portion of the Penn Treebank, containing 45
different POS tags. Sections 0-18 are used for training, sections 19-21 for development, and sections
Expand Down
15 changes: 9 additions & 6 deletions question_answering.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
## Question answering / reading comprehension
# Question answering

Question answering is the task of answering a question. Most current datasets
frame this task as reading comprehension where the question is about a paragraph
or document and the answer often is a span in the document. The Machine Reading group
at UCL also provides an [overview of reading comprehension tasks](https://uclmr.github.io/ai4exams/data.html).
Question answering is the task of answering a question.

### ARC

Expand All @@ -15,7 +12,13 @@ based on accuracy.

A public leaderboard is available on the [ARC website](http://data.allenai.org/arc/).

### CNN / Daily Mail—reading comprehension
## Reading comprehension

Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph
or document and the answer often is a span in the document. The Machine Reading group
at UCL also provides an [overview of reading comprehension tasks](https://uclmr.github.io/ai4exams/data.html).

### CNN / Daily Mail

The [CNN / Daily Mail dataset](https://arxiv.org/abs/1506.03340) is a Cloze-style reading comprehension dataset
created from CNN and Daily Mail news articles using heuristics. [Close-style](https://en.wikipedia.org/wiki/Cloze_test)
Expand Down
4 changes: 3 additions & 1 deletion semantic_parsing.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
## Semantic parsing
# Semantic parsing

Semantic parsing is the task of translating natural language into a formal meaning
representation on which a machine can act. Representations may be an executable language
such as SQL or more abstract representations such as [Abstract Meaning Representation (AMR)](https://en.wikipedia.org/wiki/Abstract_Meaning_Representation).

## SQL parsing

### WikiSQL

The [WikiSQL dataset](https://arxiv.org/abs/1709.00103) consists of 87,673
Expand Down
4 changes: 2 additions & 2 deletions semantic_role_labeling.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Semantic role labeling
# Semantic role labeling

Semantic role labeling aims to model the predicate-argument structure of a sentence
and is often described as answering "Who did what to whom". BIO notation is typically
Expand All @@ -10,7 +10,7 @@ Example:
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| B-ARG1 | I-ARG1 | O | O | O | V | B-ARG2 | I-ARG2 | B-ARG3 | I-ARG3 | I-ARG3 |

### OntoNotes—semantic role labeling
### OntoNotes

Models are typically evaluated on the [OntoNotes benchmark](http://www.aclweb.org/anthology/W13-3516) based on F1.

Expand Down
4 changes: 3 additions & 1 deletion semantic_textual_similarity.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Semantic textual similarity
# Semantic textual similarity

Semantic textual similarity deals with determining how similar two pieces of texts are.
This can take the form of assigning a score from 1 to 5. Related tasks are paraphrase or duplicate identification.
Expand All @@ -24,6 +24,8 @@ The data can be downloaded from [here](https://github.com/facebookresearch/SentE
| GenSen (Subramanian et al., 2018) | 78.6/84.4 | 0.888 | 87.8 | 78.9/78.6 | [Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning](https://arxiv.org/abs/1804.00079) | |
| InferSent (Conneau et al., 2017) | 76.2/83.1 | 0.884 | 86.3 | 75.8/75.5 | [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://arxiv.org/abs/1705.02364) |

## Paraphrase identification

### Quora Question Pairs

The [Quora Question Pairs dataset](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs)
Expand Down
30 changes: 16 additions & 14 deletions sentiment_analysis.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Sentiment analysis
# Sentiment analysis

Sentiment analysis is the task of classifying the polarity of a given text.

Expand All @@ -15,19 +15,6 @@ negative. Models are evaluated based on accuracy.
| Virtual adversarial training (Miyato et al., 2016) | 94.1 | [Adversarial Training Methods for Semi-Supervised Text Classification](https://arxiv.org/abs/1605.07725) |
| BCN+Char+CoVe (McCann et al., 2017) | 91.8 | [Learned in Translation: Contextualized Word Vectors](https://arxiv.org/abs/1708.00107) |

### Sentihood

[Sentihood](http://www.aclweb.org/anthology/C16-1146) is a dataset for targeted aspect-based sentiment analysis (TABSA), which aims
to identify fine-grained polarity towards a specific aspect. The dataset consists of 5,215 sentences,
3,862 of which contain a single target, and the remainder multiple targets. F1 is used as evaluation metric
for aspect detection and accuracy as evaluation metric for sentiment analysis.

| Model | Aspect | Sentiment | Paper / Source |
| ------------- | :-----:| :-----:| --- |
| Liu et al. (2018) | 78.5 | 91.0 | [Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-based Sentiment Analysis](http://aclweb.org/anthology/N18-2045) |
| SenticLSTM (Ma et al., 2018) | 78.2 | 89.3 | [Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM](http://sentic.net/sentic-lstm.pdf) |
| LSTM-LOC (Saeidi et al., 2016) | 69.3 | 81.9 | [Sentihood: Targeted aspect based sentiment analysis dataset for urban neighbourhoods](http://www.aclweb.org/anthology/C16-1146) |

### SST

The [Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/index.html)
Expand Down Expand Up @@ -74,3 +61,18 @@ Binary classification:
| DPCNN (Johnson and Zhang, 2017) | 2.64 | [Deep Pyramid Convolutional Neural Networks for Text Categorization](http://aclweb.org/anthology/P17-1052) |
| CNN (Johnson and Zhang, 2016) | 2.90 | [Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings](https://arxiv.org/abs/1602.02373) |
| Char-level CNN (Zhang et al., 2015) | 4.88 | [Character-level Convolutional Networks for Text Classification](https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf) |

## Aspect-based sentiment analysis

### Sentihood

[Sentihood](http://www.aclweb.org/anthology/C16-1146) is a dataset for targeted aspect-based sentiment analysis (TABSA), which aims
to identify fine-grained polarity towards a specific aspect. The dataset consists of 5,215 sentences,
3,862 of which contain a single target, and the remainder multiple targets. F1 is used as evaluation metric
for aspect detection and accuracy as evaluation metric for sentiment analysis.

| Model | Aspect | Sentiment | Paper / Source |
| ------------- | :-----:| :-----:| --- |
| Liu et al. (2018) | 78.5 | 91.0 | [Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-based Sentiment Analysis](http://aclweb.org/anthology/N18-2045) |
| SenticLSTM (Ma et al., 2018) | 78.2 | 89.3 | [Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM](http://sentic.net/sentic-lstm.pdf) |
| LSTM-LOC (Saeidi et al., 2016) | 69.3 | 81.9 | [Sentihood: Targeted aspect based sentiment analysis dataset for urban neighbourhoods](http://www.aclweb.org/anthology/C16-1146) |
4 changes: 2 additions & 2 deletions summarization.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
## Summarization
# Summarization

Summarization is the task of producing a shorter version of a document that preserves most of the
original document's meaning.

### CNN / Daily Mail—summarization
### CNN / Daily Mail

The [CNN / Daily Mail dataset](https://arxiv.org/abs/1506.03340) as processed by
[Nallapati et al. (2016)](http://www.aclweb.org/anthology/K16-1028) has been used
Expand Down
2 changes: 1 addition & 1 deletion text_classification.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Text classification
# Text classification

Text classification is the task of assigning a sentence or document an appropriate category.
The categories depend on the chosen dataset and can range from topics.
Expand Down

0 comments on commit 34b01fe

Please sign in to comment.