Skip to content

Commit

Permalink
Converted remaining YAML files to Markdown tables, streamlined contri…
Browse files Browse the repository at this point in the history
…bution instructions, moved task files to language folders (sebastianruder#131)
  • Loading branch information
sebastianruder authored Oct 25, 2018
1 parent ae4990b commit 1cd46b3
Show file tree
Hide file tree
Showing 45 changed files with 421 additions and 1,196 deletions.
166 changes: 74 additions & 92 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,56 +4,56 @@

### English

- [ASR](english/asr.md)
- [Automatic speech recognition](english/automatic_speech_recognition.md)
- [CCG supertagging](english/ccg_supertagging.md)
- [Chunking](english/chunking.md)
- [Constituency parsing](english/constituency_parsing.md)
- [Coreference resolution](english/coreference_resolution.md)
- [Dependency parsing](english/dependency_parsing.md)
- [Dialog](english/dialog.md)
- [Dialogue](english/dialogue.md)
- [Domain adaptation](english/domain_adaptation.md)
- [Entity Linking](english/entity_linking.md)
- [Grammatical Error Correction](english/grammatical_error_correction.md)
- [Information Extraction](english/information_extraction.md)
- [Entity linking](english/entity_linking.md)
- [Grammatical error correction](english/grammatical_error_correction.md)
- [Information extraction](english/information_extraction.md)
- [Language modeling](english/language_modeling.md)
- [Lexical Normalization](english/lexical_normalization.md)
- [Lexical normalization](english/lexical_normalization.md)
- [Machine translation](english/machine_translation.md)
- [Multi-task learning](english/multi-task_learning.md)
- [Multimodal](english/multimodal.md)
- [Multi-modal](english/multimodal.md)
- [Named entity recognition](english/named_entity_recognition.md)
- [Natural language inference](english/natural_language_inference.md)
- [Part-of-speech tagging](english/part-of-speech_tagging.md)
- [Question answering](english/question_answering.md)
- [Relation Prediction](english/relation_prediction.md)
- [Relation prediction](english/relation_prediction.md)
- [Relationship extraction](english/relationship_extraction.md)
- [Semantic textual similarity](english/semantic_textual_similarity.md)
- [Sentiment analysis](english/sentiment_analysis.md)
- [Semantic parsing](english/semantic_parsing.md)
- [Semantic role labeling](english/semantic_role_labeling.md)
- [Sentiment analysis](english/sentiment_analysis.md)
- [Stance detection](english/stance_detection.md)
- [Summarization](english/summarization.md)
- [Taxonomy learning](english/taxonomy_learning.md)
- [Temporal Processing](english/temporal_processing.md)
- [Temporal processing](english/temporal_processing.md)
- [Text classification](english/text_classification.md)
- [Word Sense Disambiguation](english/word_sense_disambiguation.md)
- [Word sense disambiguation](english/word_sense_disambiguation.md)

### Korean
### Chinese

- [Chunking](korean/korean.md)
- [Part-of-speech tagging](korean/korean.md)
- [Entity linking](chinese/chinese.md#entity-linking)

### Hindi

- [Chunking](hindi/hindi.md)
- [Machine Translation](hindi/hindi.md)
- [Chunking](hindi/hindi.md#chunking)
- [Part-of-speech tagging](hindi/hindi.md#part-of-speech-tagging)
- [Machine Translation](hindi/hindi.md#machine-translation)

### Vietnamese

- [Word segmentation](vietnamese/vietnamese.md)
- [Part-of-speech tagging](vietnamese/vietnamese.md)
- [Named entity recognition](vietnamese/vietnamese.md)
- [Dependency parsing](vietnamese/vietnamese.md)
- [Machine translation](vietnamese/vietnamese.md)
- [Dependency parsing](vietnamese/vietnamese.md#dependency-parsing)
- [Machine translation](vietnamese/vietnamese.md#machine-translation)
- [Named entity recognition](vietnamese/vietnamese.md#named-entity-recognition)
- [Part-of-speech tagging](vietnamese/vietnamese.md#part-of-speech-tagging)
- [Word segmentation](vietnamese/vietnamese.md#word-segmentation)

This document aims to track the progress in Natural Language Processing (NLP) and give an overview
of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets.
Expand All @@ -68,89 +68,71 @@ the reader will be pointed there.
If you want to find this document again in the future, just go to [`nlpprogress.com`](https://nlpprogress.com/)
or [`nlpsota.com`](http://nlpsota.com/) in your browser.

### Wish list

These are tasks and datasets that are still missing.

- Bilingual dictionary induction
- Discourse parsing
- Keyphrase extraction
- Knowledge base population (KBP)
- More dialogue tasks
- Semi-supervised learning

### Contributing

If you would like to add a new result, you can do so with a pull request (PR).
In order to minimize noise and to make maintenance somewhat manageable, results reported
in published papers will be preferred (indicate the venue of publication in your PR);
an exception may be made for influential preprints. The result should include the name
of the method, the citation, the score, and a link to the paper and should be added
so that the table is sorted (with the best result on top).
#### Guidelines

If your pull request contains a new result, please make sure that "new result" appears
somewhere in the title of the PR. This way, we can track which tasks are the most
active and receive the most attention.
**Results**   Results reported in published papers are preferred; an exception may be made for influential preprints.

**Datasets**   Datasets should have been used for evaluation in at least one published paper besides
the one that introduced the dataset.

In order to make reproduction easier, we recommend to add a link to an implementation
to each method if available. You can add a `Code` column (see below) to the table if it does not exist.
**Code**   We recommend to add a link to an implementation
if available. You can add a `Code` column (see below) to the table if it does not exist.
In the `Code` column, indicate an official implementation with [Official](http://link_to_implementation).
If an unofficial implementation is available, use [Link](http://link_to_implementation) (see below).
If no implementation is available, you can leave the cell empty.

| Model | Score | Paper / Source | Code |
| ------------- | :-----:| --- | --- |
| | | | [Official](http://link_to_implementation) |
| | | | [Link](http://link_to_implementation) |
#### Adding a new result

To add a new dataset or task, follow the below steps. Any new datasets
should have been used for evaluation in at least one published paper besides
the one that introduced the dataset.
If you would like to add a new result, you can just click on the small edit button in the top-right
corner of the file for the respective task (see below).

![Click on the edit button to add a file](img/edit_file.png)

1. Fork the repository.
2. If your task is completely new, create a new file and link to it in the table of contents above.
If not, add your task or dataset to the respective section of the corresponding file (in alphabetical order).
3. Briefly describe the dataset/task and include relevant references.
4. Describe the evaluation setting and evaluation metric.
5. Show how an annotated example of the dataset/task looks like.
6. Add a download link if available.
7. Copy the below table and fill in at least two results (including the state-of-the-art)
for your dataset/task (change Score to the metric of your dataset).
8. Submit your change as a pull request.
This allows you to edit the file in Markdown. Simply add a row to the corresponding table in the
same format. Make sure that the table stays sorted (with the best result on top).
After you've made your change, make sure that the table still looks ok by clicking on the
"Preview changes" tab at the top of the page. If everything looks good, go to the bottom of the page,
where you see the below form.

![Fill out the file change information](img/propose_file_change.png)

Add a name for your proposed change, an optional description, indicate that you would like to
"Create a new branch for this commit and start a pull request", and click on "Propose file change".

#### Adding a new dataset or task

For adding a new dataset or task, you can also follow the steps above. Alternatively, you can fork the repository.
In both cases, follow the steps below:

1. If your task is completely new, create a new file and link to it in the table of contents above.
1. If not, add your task or dataset to the respective section of the corresponding file (in alphabetical order).
1. Briefly describe the dataset/task and include relevant references.
1. Describe the evaluation setting and evaluation metric.
1. Show how an annotated example of the dataset/task looks like.
1. Add a download link if available.
1. Copy the below table and fill in at least two results (including the state-of-the-art)
for your dataset/task (change Score to the metric of your dataset). If your dataset/task
has multiple metrics, add them to the right of `Score`.
1. Submit your change as a pull request.

| Model | Score | Paper / Source | Code |
| ------------- | :-----:| --- | --- |
| | | | |

**Important note:** We are currently transitioning from storing results in tables (as above) to using
[YAML](https://en.wikipedia.org/wiki/YAML) files for their greater flexibility. This will allow us to
highlight additional attributes and have interesting visualizations of results down the line.

If the results for your task are already stored in a YAML file, you can simply extend the YAML file
using the same fields as the existing entries. To check that the resulting table looks as expected,
you can build the site locally using Jekyll by following the steps detailed
[here](https://help.github.com/articles/setting-up-your-github-pages-site-locally-with-jekyll/#requirements):

1. Check whether you have Ruby 2.1.0 or higher installed with `ruby --version`, otherwise [install it](https://www.ruby-lang.org/en/downloads/).
On OS X for instance, this can be done with `brew install ruby`. Make sure you also have `ruby-dev` and `zlib1g-dev` installed.
1. Install Bundler `gem install bundler`. If you run into issues with installing bundler on OS X, have a look
[here](https://bundler.io/v1.16/guides/rubygems_tls_ssl_troubleshooting_guide.html) for troubleshooting tips. Also try refreshing
the terminal.
1. Clone the repo locally: `git clone https://github.com/sebastianruder/NLP-progress`
1. Navigate to the repo with `cd NLP-progress`
1. Install Jekyll: `bundle install`
1. Run the Jekyll site locally: `bundle exec jekyll serve`
1. You can now preview the local Jekyll site in your browser at `http://localhost:4000`.

### Things to do

- Add a column for code (see above) to each table and a link to the source code to each method.
- Add pointers on how to retrieve data.
- Provide more details regarding the evaluation setup of each task.
- Add an example to every task/dataset.
- Add statistics to every dataset.
- Provide a description and details for every task / dataset.
- Add a table of contents to every file (particularly the large ones).
- We could potentially use [readthedocs](https://github.com/rtfd/readthedocs.org) to provide a clearer structure.
- All current datasets in this list are for the English language (except for [UD](#ud)). In a separate section, we could add
datasets for other languages.

### Wish list

These are tasks and datasets that are still missing:

- Bilingual dictionary induction
- Discourse parsing
- Keyphrase extraction
- Knowledge base population (KBP)
- More dialogue tasks
- Semi-supervised learning

### Instructions for building the site locally

Instructions for building the website locally using Jekyll can be found [here](jekyll_instructions.md).
24 changes: 0 additions & 24 deletions _data/ccg_supertagging.yaml

This file was deleted.

12 changes: 0 additions & 12 deletions _data/chunking.yaml

This file was deleted.

63 changes: 0 additions & 63 deletions _data/constituency_parsing.yaml

This file was deleted.

Loading

0 comments on commit 1cd46b3

Please sign in to comment.