diff --git a/README.md b/README.md index f5cd17af..098ac5a8 100644 --- a/README.md +++ b/README.md @@ -4,56 +4,56 @@ ### English -- [ASR](english/asr.md) +- [Automatic speech recognition](english/automatic_speech_recognition.md) - [CCG supertagging](english/ccg_supertagging.md) - [Chunking](english/chunking.md) - [Constituency parsing](english/constituency_parsing.md) - [Coreference resolution](english/coreference_resolution.md) - [Dependency parsing](english/dependency_parsing.md) -- [Dialog](english/dialog.md) +- [Dialogue](english/dialogue.md) - [Domain adaptation](english/domain_adaptation.md) -- [Entity Linking](english/entity_linking.md) -- [Grammatical Error Correction](english/grammatical_error_correction.md) -- [Information Extraction](english/information_extraction.md) +- [Entity linking](english/entity_linking.md) +- [Grammatical error correction](english/grammatical_error_correction.md) +- [Information extraction](english/information_extraction.md) - [Language modeling](english/language_modeling.md) -- [Lexical Normalization](english/lexical_normalization.md) +- [Lexical normalization](english/lexical_normalization.md) - [Machine translation](english/machine_translation.md) - [Multi-task learning](english/multi-task_learning.md) -- [Multimodal](english/multimodal.md) +- [Multi-modal](english/multimodal.md) - [Named entity recognition](english/named_entity_recognition.md) - [Natural language inference](english/natural_language_inference.md) - [Part-of-speech tagging](english/part-of-speech_tagging.md) - [Question answering](english/question_answering.md) -- [Relation Prediction](english/relation_prediction.md) +- [Relation prediction](english/relation_prediction.md) - [Relationship extraction](english/relationship_extraction.md) - [Semantic textual similarity](english/semantic_textual_similarity.md) -- [Sentiment analysis](english/sentiment_analysis.md) - [Semantic parsing](english/semantic_parsing.md) - [Semantic role labeling](english/semantic_role_labeling.md) +- [Sentiment analysis](english/sentiment_analysis.md) - [Stance detection](english/stance_detection.md) - [Summarization](english/summarization.md) - [Taxonomy learning](english/taxonomy_learning.md) -- [Temporal Processing](english/temporal_processing.md) +- [Temporal processing](english/temporal_processing.md) - [Text classification](english/text_classification.md) -- [Word Sense Disambiguation](english/word_sense_disambiguation.md) +- [Word sense disambiguation](english/word_sense_disambiguation.md) -### Korean +### Chinese -- [Chunking](korean/korean.md) -- [Part-of-speech tagging](korean/korean.md) +- [Entity linking](chinese/chinese.md#entity-linking) ### Hindi -- [Chunking](hindi/hindi.md) -- [Machine Translation](hindi/hindi.md) +- [Chunking](hindi/hindi.md#chunking) +- [Part-of-speech tagging](hindi/hindi.md#part-of-speech-tagging) +- [Machine Translation](hindi/hindi.md#machine-translation) ### Vietnamese -- [Word segmentation](vietnamese/vietnamese.md) -- [Part-of-speech tagging](vietnamese/vietnamese.md) -- [Named entity recognition](vietnamese/vietnamese.md) -- [Dependency parsing](vietnamese/vietnamese.md) -- [Machine translation](vietnamese/vietnamese.md) +- [Dependency parsing](vietnamese/vietnamese.md#dependency-parsing) +- [Machine translation](vietnamese/vietnamese.md#machine-translation) +- [Named entity recognition](vietnamese/vietnamese.md#named-entity-recognition) +- [Part-of-speech tagging](vietnamese/vietnamese.md#part-of-speech-tagging) +- [Word segmentation](vietnamese/vietnamese.md#word-segmentation) This document aims to track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets. @@ -68,89 +68,71 @@ the reader will be pointed there. If you want to find this document again in the future, just go to [`nlpprogress.com`](https://nlpprogress.com/) or [`nlpsota.com`](http://nlpsota.com/) in your browser. -### Wish list - -These are tasks and datasets that are still missing. - -- Bilingual dictionary induction -- Discourse parsing -- Keyphrase extraction -- Knowledge base population (KBP) -- More dialogue tasks -- Semi-supervised learning - ### Contributing -If you would like to add a new result, you can do so with a pull request (PR). -In order to minimize noise and to make maintenance somewhat manageable, results reported -in published papers will be preferred (indicate the venue of publication in your PR); -an exception may be made for influential preprints. The result should include the name -of the method, the citation, the score, and a link to the paper and should be added -so that the table is sorted (with the best result on top). +#### Guidelines -If your pull request contains a new result, please make sure that "new result" appears -somewhere in the title of the PR. This way, we can track which tasks are the most -active and receive the most attention. +**Results**   Results reported in published papers are preferred; an exception may be made for influential preprints. + +**Datasets**   Datasets should have been used for evaluation in at least one published paper besides +the one that introduced the dataset. -In order to make reproduction easier, we recommend to add a link to an implementation -to each method if available. You can add a `Code` column (see below) to the table if it does not exist. +**Code**   We recommend to add a link to an implementation +if available. You can add a `Code` column (see below) to the table if it does not exist. In the `Code` column, indicate an official implementation with [Official](http://link_to_implementation). If an unofficial implementation is available, use [Link](http://link_to_implementation) (see below). If no implementation is available, you can leave the cell empty. -| Model | Score | Paper / Source | Code | -| ------------- | :-----:| --- | --- | -| | | | [Official](http://link_to_implementation) | -| | | | [Link](http://link_to_implementation) | +#### Adding a new result -To add a new dataset or task, follow the below steps. Any new datasets -should have been used for evaluation in at least one published paper besides -the one that introduced the dataset. +If you would like to add a new result, you can just click on the small edit button in the top-right +corner of the file for the respective task (see below). + +![Click on the edit button to add a file](img/edit_file.png) -1. Fork the repository. -2. If your task is completely new, create a new file and link to it in the table of contents above. -If not, add your task or dataset to the respective section of the corresponding file (in alphabetical order). -3. Briefly describe the dataset/task and include relevant references. -4. Describe the evaluation setting and evaluation metric. -5. Show how an annotated example of the dataset/task looks like. -6. Add a download link if available. -7. Copy the below table and fill in at least two results (including the state-of-the-art) - for your dataset/task (change Score to the metric of your dataset). -8. Submit your change as a pull request. +This allows you to edit the file in Markdown. Simply add a row to the corresponding table in the +same format. Make sure that the table stays sorted (with the best result on top). +After you've made your change, make sure that the table still looks ok by clicking on the +"Preview changes" tab at the top of the page. If everything looks good, go to the bottom of the page, +where you see the below form. + +![Fill out the file change information](img/propose_file_change.png) + +Add a name for your proposed change, an optional description, indicate that you would like to +"Create a new branch for this commit and start a pull request", and click on "Propose file change". + +#### Adding a new dataset or task + +For adding a new dataset or task, you can also follow the steps above. Alternatively, you can fork the repository. +In both cases, follow the steps below: + +1. If your task is completely new, create a new file and link to it in the table of contents above. +1. If not, add your task or dataset to the respective section of the corresponding file (in alphabetical order). +1. Briefly describe the dataset/task and include relevant references. +1. Describe the evaluation setting and evaluation metric. +1. Show how an annotated example of the dataset/task looks like. +1. Add a download link if available. +1. Copy the below table and fill in at least two results (including the state-of-the-art) + for your dataset/task (change Score to the metric of your dataset). If your dataset/task + has multiple metrics, add them to the right of `Score`. +1. Submit your change as a pull request. | Model | Score | Paper / Source | Code | | ------------- | :-----:| --- | --- | | | | | | -**Important note:** We are currently transitioning from storing results in tables (as above) to using -[YAML](https://en.wikipedia.org/wiki/YAML) files for their greater flexibility. This will allow us to -highlight additional attributes and have interesting visualizations of results down the line. - -If the results for your task are already stored in a YAML file, you can simply extend the YAML file -using the same fields as the existing entries. To check that the resulting table looks as expected, -you can build the site locally using Jekyll by following the steps detailed -[here](https://help.github.com/articles/setting-up-your-github-pages-site-locally-with-jekyll/#requirements): - -1. Check whether you have Ruby 2.1.0 or higher installed with `ruby --version`, otherwise [install it](https://www.ruby-lang.org/en/downloads/). -On OS X for instance, this can be done with `brew install ruby`. Make sure you also have `ruby-dev` and `zlib1g-dev` installed. -1. Install Bundler `gem install bundler`. If you run into issues with installing bundler on OS X, have a look -[here](https://bundler.io/v1.16/guides/rubygems_tls_ssl_troubleshooting_guide.html) for troubleshooting tips. Also try refreshing -the terminal. -1. Clone the repo locally: `git clone https://github.com/sebastianruder/NLP-progress` -1. Navigate to the repo with `cd NLP-progress` -1. Install Jekyll: `bundle install` -1. Run the Jekyll site locally: `bundle exec jekyll serve` -1. You can now preview the local Jekyll site in your browser at `http://localhost:4000`. - -### Things to do - -- Add a column for code (see above) to each table and a link to the source code to each method. -- Add pointers on how to retrieve data. -- Provide more details regarding the evaluation setup of each task. -- Add an example to every task/dataset. -- Add statistics to every dataset. -- Provide a description and details for every task / dataset. -- Add a table of contents to every file (particularly the large ones). -- We could potentially use [readthedocs](https://github.com/rtfd/readthedocs.org) to provide a clearer structure. -- All current datasets in this list are for the English language (except for [UD](#ud)). In a separate section, we could add -datasets for other languages. + +### Wish list + +These are tasks and datasets that are still missing: + +- Bilingual dictionary induction +- Discourse parsing +- Keyphrase extraction +- Knowledge base population (KBP) +- More dialogue tasks +- Semi-supervised learning + +### Instructions for building the site locally + +Instructions for building the website locally using Jekyll can be found [here](jekyll_instructions.md). diff --git a/_data/ccg_supertagging.yaml b/_data/ccg_supertagging.yaml deleted file mode 100644 index 2380b3cf..00000000 --- a/_data/ccg_supertagging.yaml +++ /dev/null @@ -1,24 +0,0 @@ -- year: 2016 - authors: Lewis et al. - accuracy: 94.7 - paper: LSTM CCG Parsing - url: https://aclweb.org/anthology/N/N16/N16-1026.pdf - -- year: 2016 - authors: Vaswani et al. - accuracy: 94.24 - paper: Supertagging with LSTMs - url: https://aclweb.org/anthology/N/N16/N16-1027.pdf - -- model: Low supervision - year: 2016 - authors: Søgaard and Goldberg - accuracy: 93.26 - paper: Deep multi-task learning with low level tasks supervised at lower layers - url: http://anthology.aclweb.org/P16-2038 - -- year: 2015 - authors: Xu et al. - accuracy: 93.00 - paper: CCG Supertagging with a Recurrent Neural Network - url: http://www.aclweb.org/anthology/P15-2041 diff --git a/_data/chunking.yaml b/_data/chunking.yaml deleted file mode 100644 index d0740a9c..00000000 --- a/_data/chunking.yaml +++ /dev/null @@ -1,12 +0,0 @@ -- model: Low supervision - authors: Søgaard and Goldberg - year: 2016 - F1 score: 95.57 - paper: Deep multi-task learning with low level tasks supervised at lower layers - url: http://anthology.aclweb.org/P16-2038 - -- authors: Suzuki and Isozaki - year: 2008 - F1 score: 95.15 - paper: Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data - url: https://aclanthology.info/pdf/P/P08/P08-1076.pdf diff --git a/_data/constituency_parsing.yaml b/_data/constituency_parsing.yaml deleted file mode 100644 index 4012e25c..00000000 --- a/_data/constituency_parsing.yaml +++ /dev/null @@ -1,63 +0,0 @@ -- - model: Self-attentive encoder + ELMo - authors: Kitaev and Klein - year: 2018 - F1 score: 95.13 - paper: Constituency Parsing with a Self-Attentive Encoder - url: https://arxiv.org/abs/1805.01052 -- - model: Model combination - authors: Fried et al. - year: 2017 - F1 score: 94.66 - paper: Improving Neural Parsing by Disentangling Model Combination and Reranking Effects - url: https://arxiv.org/abs/1707.03058 -- - model: In-order - authors: Liu and Zhang - year: 2017 - F1 score: 94.2 - paper: In-Order Transition-based Constituent Parsing - url: http://aclweb.org/anthology/Q17-1029 -- - model: Semi-supervised LSTM-LM - authors: Choe and Charniak - year: 2016 - F1 score: 93.8 - paper: Parsing as Language Modeling - url: http://www.aclweb.org/anthology/D16-1257 -- - model: Stack-only RNNG - authors: Kuncoro et al. - year: 2017 - F1 score: 93.6 - paper: What Do Recurrent Neural Network Grammars Learn About Syntax? - url: https://arxiv.org/abs/1611.05774 -- - model: RNN Grammar - authors: Dyer et al. - year: 2016 - F1 score: 93.3 - paper: Recurrent Neural Network Grammars - url: https://www.aclweb.org/anthology/N16-1024 -- - model: Transformer - authors: Vaswani et al. - year: 2017 - F1 score: 92.7 - paper: Attention Is All You Need - url: https://arxiv.org/abs/1706.03762 -- - model: Semi-supervised LSTM - authors: Vinyals et al. - year: 2015 - F1 score: 92.1 - paper: Grammar as a Foreign Language - url: https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf -- - model: Self-trained parser - authors: McClosky et al. - year: 2006 - F1 score: 92.1 - paper: Effective Self-Training for Parsing - url: https://pdfs.semanticscholar.org/6f0f/64f0dab74295e5eb139c160ed79ff262558a.pdf diff --git a/_data/dependency_parsing.yaml b/_data/dependency_parsing.yaml deleted file mode 100644 index dacf58a8..00000000 --- a/_data/dependency_parsing.yaml +++ /dev/null @@ -1,151 +0,0 @@ -Penn_Treebank: -- &DozatManning2017 - model: Deep Biaffine - authors: Dozat and Manning - year: 2017 - POS: 97.3 - UAS: 95.44 - LAS: 93.76 - paper: Deep Biaffine Attention for Neural Dependency Parsing - url: https://arxiv.org/abs/1611.01734 - code: - - name: Official - url: https://github.com/tdozat/Parser-v1 -- - model: jPTDP - authors: Nguyen and Verspoor - year: 2018 - POS: 97.97 - UAS: 94.51 - LAS: 92.87 - paper: An improved neural network model for joint POS tagging and dependency parsing - url: https://arxiv.org/abs/1807.03955 - code: - - name: Official - url: https://github.com/datquocnguyen/jPTDP -- - model: '' - authors: Andor et al. - year: 2016 - POS: 97.44 - UAS: 94.61 - LAS: 92.79 - paper: Globally Normalized Transition-Based Neural Networks - url: https://www.aclweb.org/anthology/P16-1231 - code: [] -- model: Distilled neural FOG - authors: Kuncoro et al. - year: 2016 - POS: 97.3 - UAS: 94.26 - LAS: 92.06 - paper: Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser - url: https://arxiv.org/abs/1609.07561 - code: [] -- - model: '' - authors: Weiss et al. - year: 2015 - POS: 97.44 - UAS: 93.99 - LAS: 92.05 - paper: Structured Training for Neural Network Transition-Based Parsing - url: http://anthology.aclweb.org/P/P15/P15-1032.pdf - code: [] -- - model: BIST transition-based parser - authors: Kiperwasser and Goldberg - year: 2016 - POS: 97.3 - UAS: 93.9 - LAS: 91.9 - paper: Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations - url: https://aclweb.org/anthology/Q16-1023 - code: - - name: Official - url: https://github.com/elikip/bist-parser/tree/master/barchybrid/src -- - model: Arc-hybrid - authors: Ballesteros et al. - year: 2016 - POS: 97.3 - UAS: 93.56 - LAS: 91.42 - paper: Training with Exploration Improves a Greedy Stack-LSTM Parser - url: https://arxiv.org/abs/1603.03793 - code: [] -- - model: BIST graph-based parser - authors: Kiperwasser and Goldberg - year: 2016 - POS: 97.3 - UAS: 93.1 - LAS: 91.0 - paper: Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations - url: https://aclweb.org/anthology/Q16-1023 - code: - - name: Official - url: https://github.com/elikip/bist-parser/tree/master/bmstparser/src - -Reference: -- - model: Stack-only RNNG - authors: Kuncoro et al. - year: 2017 - UAS: 95.8 - LAS: 94.6 - paper: What Do Recurrent Neural Network Grammars Learn About Syntax? - url: https://arxiv.org/abs/1611.05774 - code: [] - comment: Constituent parser -- - model: Semi-supervised LSTM-LM - authors: Choe and Charniak - year: 2016 - UAS: 95.9 - LAS: 94.1 - paper: Parsing as Language Modeling - url: http://www.aclweb.org/anthology/D16-1257 - code: [] - comment: Constituent parser -- <<: *DozatManning2017 - UAS: 95.66 - LAS: 94.03 - comment: Stanford conversion **v3.5.0** - -Unsupervised_Penn_Treebank: -- - model: Iterative reranking - authors: Le & Zuidema - year: 2015 - UAS: 66.2 - paper: Unsupervised Dependency Parsing - Let’s Use Supervised Parsers - url: http://www.aclweb.org/anthology/N15-1067 -- - model: Combined System - authors: Spitkovsky et al - year: 2013 - UAS: 64.4 - paper: Breaking Out of Local Optima with Count Transforms and Model Recombination - A Study in Grammar Induction - url: http://www.aclweb.org/anthology/D13-1204 -- - model: Tree Substitution Grammar DMV - authors: Blunsom & Cohn - year: 2010 - UAS: 55.7 - paper: Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing - url: http://www.aclweb.org/anthology/D10-1117 -- - model: Shared Logistic Normal DMV - authors: Cohen & Smith - year: 2009 - UAS: 41.4 - paper: Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction - url: http://www.aclweb.org/anthology/N09-1009 -- - model: DMV - authors: Klein & Manning - year: 2004 - UAS: 35.9 - paper: Corpus-Based Induction of Syntactic Structure - Models of Dependency and Constituency - url: http://www.aclweb.org/anthology/P04-1061 diff --git a/_data/dialog.yaml b/_data/dialog.yaml deleted file mode 100644 index 439d5e1d..00000000 --- a/_data/dialog.yaml +++ /dev/null @@ -1,30 +0,0 @@ -- - model: - authors: Liu et al. - year: 2018 - Area: 90 - Food: 84 - Price: 92 - Joint: 72 - paper: Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems - url: https://arxiv.org/abs/1804.06512 -- - model: Neural belief tracker - authors: Mrkšić et al. - year: 2017 - Area: 90 - Food: 84 - Price: 94 - Joint: 72 - paper: "Neural Belief Tracker: Data-Driven Dialogue State Tracking" - url: https://arxiv.org/abs/1606.03777 -- - model: RNN - authors: Henderson et al. - year: 2014 - Area: 92 - Food: 86 - Price: 86 - Joint: 69 - paper: Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised gate - url: http://svr-ftp.eng.cam.ac.uk/~sjy/papers/htyo14.pdf diff --git a/_data/domain_adaptation.yaml b/_data/domain_adaptation.yaml deleted file mode 100644 index c97a19ef..00000000 --- a/_data/domain_adaptation.yaml +++ /dev/null @@ -1,48 +0,0 @@ -- - model: Multi-task tri-training - authors: Ruder and Plank - year: 2018 - DVD: 78.14 - Books: 74.86 - Electronics: 81.45 - Kitchen: 82.14 - Average: 79.15 - paper: Strong Baselines for Neural Semi-supervised Learning under Domain Shift - url: https://arxiv.org/abs/1804.09530 - code: [] -- - model: Asymmetric tri-training - authors: Saito et al. - year: 2017 - DVD: 76.17 - Books: 72.97 - Electronics: 80.47 - Kitchen: 83.97 - Average: 78.39 - paper: Asymmetric Tri-training for Unsupervised Domain Adaptation - url: https://arxiv.org/abs/1702.08400 - code: [] -- - model: VFAE - authors: Louizos et al. - year: 2015 - DVD: 76.57 - Books: 73.4 - Electronics: 80.53 - Kitchen: 82.93 - Average: 78.36 - paper: The Variational Fair Autoencoder - url: https://arxiv.org/abs/1511.00830 - code: [] -- - model: DANN - authors: Ganin et al. - year: 2016 - DVD: 75.4 - Books: 71.43 - Electronics: 77.67 - Kitchen: 80.53 - Average: 76.26 - paper: Domain-Adversarial Training of Neural Networks - url: https://arxiv.org/abs/1505.07818 - code: [] diff --git a/_data/entity_linking_disambiguation_only.yaml b/_data/entity_linking_disambiguation_only.yaml deleted file mode 100644 index 15782802..00000000 --- a/_data/entity_linking_disambiguation_only.yaml +++ /dev/null @@ -1,51 +0,0 @@ -- - model: "" - authors: Sil et al. - year: 2018 - Micro-Precision: 94.0 - Macro-Precision: - paper: "Neural Cross-Lingual Entity Linking" - url: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16501/16101 - code: [] -- - model: "" - authors: Radhakrishnan et al. - year: 2018 - Micro-Precision: 93.0 - Macro-Precision: 93.7 - paper: "ELDEN: Improved Entity Linking using Densified Knowledge Graphs" - url: http://aclweb.org/anthology/N18-1167 - code: - - name: Official - url: https://github.com/priyaradhakrishnan0/ELDEN -- - model: "" - authors: Le et al. - year: 2018 - Micro-Precision: 93.07 - Macro-Precision: - paper: "Improving Entity Linking by Modeling Latent Relations between Mentions" - url: http://aclweb.org/anthology/P18-1148 - code: - - name: Official - url: https://github.com/lephong/mulrel-nel -- - model: "" - authors: Ganea and Hofmann - year: 2017 - Micro-Precision: 92.22 - Macro-Precision: - paper: "Deep Joint Entity Disambiguation with Local Neural Attention" - url: https://www.aclweb.org/anthology/D17-1277 - code: - - name: Official - url: https://github.com/dalab/deep-ed -- - model: "" - authors: Hoffart et al. - year: 2011 - Micro-Precision: 82.29 - Macro-Precision: 82.02 - paper: "Robust Disambiguation of Named Entities in Text" - url: http://www.aclweb.org/anthology/D11-1072 - code: [] diff --git a/_data/entity_linking_disambiguation_only_chinese.yaml b/_data/entity_linking_disambiguation_only_chinese.yaml deleted file mode 100644 index 15039bf1..00000000 --- a/_data/entity_linking_disambiguation_only_chinese.yaml +++ /dev/null @@ -1,19 +0,0 @@ -- - model: "" - authors: Sil et al. - year: 2018 - Micro-Precision: 84.4 - Macro-Precision: - paper: "Neural Cross-Lingual Entity Linking" - url: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16501/16101 - code: [] - -- - model: "" - authors: Tsai & Roth. - year: 2016 - Micro-Precision: 83.6 - Macro-Precision: - paper: "Cross-lingual wikification using multilingual embeddings" - url: http://cogcomp.org/papers/TsaiRo16b.pdf - code: [] diff --git a/_data/entity_linking_disambiguation_only_spanish.yaml b/_data/entity_linking_disambiguation_only_spanish.yaml deleted file mode 100644 index 72d09f21..00000000 --- a/_data/entity_linking_disambiguation_only_spanish.yaml +++ /dev/null @@ -1,19 +0,0 @@ -- - model: "" - authors: Sil et al. - year: 2018 - Micro-Precision: 82.3 - Macro-Precision: - paper: "Neural Cross-Lingual Entity Linking" - url: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16501/16101 - code: [] - -- - model: "" - authors: Tsai & Roth. - year: 2016 - Micro-Precision: 80.9 - Macro-Precision: - paper: "Cross-lingual wikification using multilingual embeddings" - url: http://cogcomp.org/papers/TsaiRo16b.pdf - code: [] diff --git a/_data/entity_linking_end_to_end.yaml b/_data/entity_linking_end_to_end.yaml deleted file mode 100644 index b0dca209..00000000 --- a/_data/entity_linking_end_to_end.yaml +++ /dev/null @@ -1,29 +0,0 @@ -- - model: "" - authors: Kolitsas et al. - year: 2018 - Micro-F1-strong: 86.6 - Macro-F1-strong: 89.4 - paper: "End-to-End Neural Entity Linking, CoNLL 2018" - url: https://arxiv.org/pdf/1808.07699.pdf - code: - - name: Official - url: https://github.com/dalab/end2end_neural_el -- - model: "" - authors: Piccinno et al. - year: 2014 - Micro-F1-strong: 69.2 - Macro-F1-strong: 72.8 - paper: "From TagME to WAT: a new entity annotator" - url: https://dl.acm.org/citation.cfm?id=2634350 - code: [] -- - model: "" - authors: Hoffart et al. - year: 2011 - Micro-F1-strong: 68.8 - Macro-F1-strong: 72.4 - paper: "Robust Disambiguation of Named Entities in Text" - url: http://www.aclweb.org/anthology/D11-1072 - code: [] diff --git a/_data/grammatical_error_correction.yaml b/_data/grammatical_error_correction.yaml deleted file mode 100644 index 58725fe1..00000000 --- a/_data/grammatical_error_correction.yaml +++ /dev/null @@ -1,54 +0,0 @@ -CoNLL_2014: -- &Ge2018 - model: CNN Seq2Seq + Fluency Boost - authors: Ge et al. - year: 2018 - F0.5: 61.34 - paper: "Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study" - url: https://arxiv.org/abs/1807.01270 - code: [] -- &Grundkiewicz2018 - model: SMT + BiGRU - authors: Grundkiewicz et al. - year: 2018 - F0.5: 56.25 - paper: Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation - url: https://arxiv.org/abs/1804.05945 - code: [] -- &Junczys2018 - model: Transformer - authors: Junczys-Dowmunt et al. - year: 2018 - F0.5: 55.8 - paper: Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task - url: http://aclweb.org/anthology/N18-1055 - code: [] -- &Chollampatt2018 - model: CNN Seq2Seq - authors: Chollampatt & Ng - year: 2018 - F0.5: 54.79 - paper: A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction - url: https://arxiv.org/abs/1801.08831 - code: - - name: Official - url: https://github.com/nusnlp/mlconvgec2018 - -CoNLL_2014_10_Annotators: -- <<: *Ge2018 - F0.5: 76.88 -- <<: *Grundkiewicz2018 - F0.5: 72.04 -- <<: *Chollampatt2018 - F0.5: 70.14 - comment: measured by Ge et al., 2018 - -JFLEG: -- <<: *Ge2018 - GLEU: 62.37 -- <<: *Grundkiewicz2018 - GLEU: 61.50 -- <<: *Junczys2018 - GLEU: 59.9 -- <<: *Chollampatt2018 - GLEU: 57.47 diff --git a/_data/hindi_basic.yml b/_data/hindi_basic.yml deleted file mode 100644 index c27b91c3..00000000 --- a/_data/hindi_basic.yml +++ /dev/null @@ -1,6 +0,0 @@ -- - authors: Dalal, Aniket & Nagaraj, Kumar & Sawant, Uma & Shelke, Sandeep - year: 2006 - Accuracy: 89.346 - paper: "Hindi Part-of-Speech Tagging and Chunking : A Maximum Entropy Approach" - url: https://www.researchgate.net/publication/241211496_Hindi_Part-of-Speech_Tagging_and_Chunking_A_Maximum_Entropy_Approach \ No newline at end of file diff --git a/_data/hindi_machine_translation.yaml b/_data/hindi_machine_translation.yaml deleted file mode 100644 index f0b2a517..00000000 --- a/_data/hindi_machine_translation.yaml +++ /dev/null @@ -1,10 +0,0 @@ -- - authors: Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya - year: 2018 - BLEU: 12.83 - METEOR: 0.308 - paper: "The IIT Bombay English-Hindi Parallel Corpus" - url: http://www.lrec-conf.org/proceedings/lrec2018/pdf/847.pdf - data: - - name: The IIT Bombay English-Hindi Parallel Corpus - url: http://www.cfilt.iitb.ac.in/iitb_parallel/ diff --git a/_data/korean.yml b/_data/korean.yml deleted file mode 100644 index 9f321a30..00000000 --- a/_data/korean.yml +++ /dev/null @@ -1,9 +0,0 @@ -- - model: "KoNLPy: Korean natural language processing in Python" - authors: Eunjeong L. Park and Sungzoon Cho - year: 2014 - paper: "KoNLPy: Korean natural language processing in Python" - url: http://dmlab.snu.ac.kr/~lucypark/docs/2014-10-10-hclt.pdf - code: - - name: (Github) Python package for Korean natural language processing - url: https://github.com/konlpy/konlpy \ No newline at end of file diff --git a/_data/language_modeling.yaml b/_data/language_modeling.yaml deleted file mode 100644 index 48af5d2c..00000000 --- a/_data/language_modeling.yaml +++ /dev/null @@ -1,246 +0,0 @@ -Word_Level: - Penn_Treebank: - - &Yang2018d - model: AWD-LSTM-MoS + dynamic eval* - authors: Yang et al. - year: 2018 - Validation perplexity: 48.33 - Test perplexity: 47.69 - paper: 'Breaking the Softmax Bottleneck: A High-Rank RNN Language Model' - url: https://arxiv.org/abs/1711.03953 - code: [] - - &Krause2017d - model: AWD-LSTM + dynamic eval* - authors: Krause et al. - year: 2017 - Validation perplexity: 51.6 - Test perplexity: 51.1 - paper: Dynamic Evaluation of Neural Sequence Models - url: https://arxiv.org/abs/1709.07432 - code: [] - - &Merity2017d - model: AWD-LSTM + continuous cache pointer* - authors: Merity et al. - year: 2017 - Validation perplexity: 53.9 - Test perplexity: 52.8 - paper: Regularizing and Optimizing LSTM Language Models - url: https://arxiv.org/abs/1708.02182 - code: [] - - &Yang2018 - model: AWD-LSTM-MoS - authors: Yang et al. - year: 2018 - Validation perplexity: 56.54 - Test perplexity: 54.44 - paper: 'Breaking the Softmax Bottleneck: A High-Rank RNN Language Model' - url: https://arxiv.org/abs/1711.03953 - code: [] - - &Merity2017 - model: AWD-LSTM - authors: Merity et al. - year: 2017 - Validation perplexity: 60.0 - Test perplexity: 57.3 - paper: Regularizing and Optimizing LSTM Language Models - url: https://arxiv.org/abs/1708.02182 - code: [] - - WikiText_2: - - <<: *Yang2018d - Validation perplexity: 42.41 - Test perplexity: 40.68 - - <<: *Krause2017d - Validation perplexity: 46.4 - Test perplexity: 44.3 - - <<: *Merity2017d - Validation perplexity: 53.8 - Test perplexity: 52.0 - - <<: *Yang2018 - Validation perplexity: 63.88 - Test perplexity: 61.45 - - <<: *Merity2017 - Validation perplexity: 68.6 - Test perplexity: 65.8 - - WikiText_103: - - &Rae2018 - model: LSTM + Hebbian + Cache + MbPA - authors: Rae et al. - year: 2018 - Validation perplexity: 29.0 - Test perplexity: 29.2 - paper: Fast Parametric Learning with Activation Memorization - url: http://arxiv.org/abs/1803.10049 - code: [] - - <<: *Rae2018 - model: LSTM + Hebbian - Validation perplexity: 34.1 - Test perplexity: 34.3 - - <<: *Rae2018 - model: LSTM - Validation perplexity: 36.0 - Test perplexity: 36.4 - - &Dauphin2016 - model: Gated CNN - authors: Dauphin et al. - year: 2016 - Validation perplexity: - Test perplexity: 37.2 - paper: Language modeling with gated convolutional networks - url: https://arxiv.org/abs/1612.08083 - code: [] - - &Bai2018 - model: Temporal CNN - authors: Bai et al. - year: 2018 - Validation perplexity: - Test perplexity: 45.2 - paper: Convolutional sequence modeling revisited - url: https://openreview.net/forum?id=rk8wKk-R-. - code: [] - - &Graves2014 - model: LSTM - authors: Graves et al. - year: 2014 - Validation perplexity: - Test perplexity: 48.7 - paper: Neural turing machines - url: https://arxiv.org/abs/1410.5401 - code: [] - -Char_Level: - Hutter_Prize: - - &Al-Rfou2018_T64 - model: T64 - authors: Al-Rfou et al. - year: 2018 - Bits per Character (BPC): 1.06 - Number of params (M): 235 - paper: Character-Level Language Modeling with Deeper Self-Attention - url: https://arxiv.org/abs/1808.04444 - code: [] - - <<: *Krause2017d - Bits per Character (BPC): 1.08 - Number of params (M): 46 - - &Al-Rfou2018_T12 - model: T12 - authors: Al-Rfou et al. - year: 2018 - Bits per Character (BPC): 1.11 - Number of params (M): 44 - paper: Character-Level Language Modeling with Deeper Self-Attention - url: https://arxiv.org/abs/1808.04444 - code: [] - - &Merity2018 - model: 3 layer AWD-LSTM - authors: Merity et al. - year: 2018 - Bits per Character (BPC): 1.232 - Number of params (M): 47 - paper: An Analysis of Neural Language Modeling at Multiple Scales - url: https://arxiv.org/abs/1803.08240 - code: [] - - &Mujika2017 - model: Large FS-LSTM-4 - authors: Mujika et al. - year: 2017 - Bits per Character (BPC): 1.245 - Number of params (M): 47 - paper: Fast-Slow Recurrent Neural Networks - url: https://arxiv.org/abs/1705.08639 - code: [] - - &Krause2016 - model: Large mLSTM +emb +WN +VD - authors: Krause et al. - year: 2016 - Bits per Character (BPC): 1.24 - Number of params (M): 46 - paper: Multiplicative LSTM for sequence modelling - url: https://arxiv.org/abs/1609.07959 - code: [] - - <<: *Mujika2017 - model: FS-LSTM-4 - Bits per Character (BPC): 1.277 - Number of params (M): 27 - - &Zilly2016 - model: Large RHN - authors: Zilly et al. - year: 2016 - Bits per Character (BPC): 1.27 - Number of params (M): 46 - paper: Recurrent Highway Networks - url: https://arxiv.org/abs/1607.03474 - code: [] - - Text8: - - <<: *Al-Rfou2018_T64 - Bits per Character (BPC): 1.13 - - <<: *Al-Rfou2018_T12 - Bits per Character (BPC): 1.18 - - <<: *Krause2017d - Bits per Character (BPC): 1.19 - Number of params (M): 45 - - <<: *Krause2016 - Bits per Character (BPC): 1.27 - Number of params (M): 45 - - <<: *Zilly2016 - Bits per Character (BPC): 1.27 - Number of params (M): 46 - - - model: LayerNorm HM-LSTM - authors: Chung et al. - year: 2017 - Bits per Character (BPC): 1.29 - Number of params (M): 35 - paper: Hierarchical Multiscale Recurrent Neural Networks - url: https://arxiv.org/abs/1609.01704 - code: [] - - - model: BN LSTM - authors: Cooijmans et al. - year: 2016 - Bits per Character (BPC): 1.36 - Number of params (M): 16 - paper: Recurrent Batch Normalization - url: https://arxiv.org/abs/1603.09025 - code: [] - - <<: *Krause2016 - model: Unregularised mLSTM - Bits per Character (BPC): 1.4 - Number of params (M): 45 - - Penn_Treebank: - - <<: *Merity2018 - Bits per Character (BPC): 1.175 - Number of params (M): 13.8 - - <<: *Merity2018 - model: 6 layer QRNN - Bits per Character (BPC): 1.187 - Number of params (M): 13.8 - - <<: *Mujika2017 - model: FS-LSTM-4 - Bits per Character (BPC): 1.19 - Number of params (M): 27.0 - - <<: *Mujika2017 - model: FS-LSTM-2 - Bits per Character (BPC): 1.193 - Number of params (M): 27.0 - - - model: NASCell - authors: Zoph & Le - year: 2016 - Bits per Character (BPC): 1.214 - Number of params (M): 16.3 - paper: Neural Architecture Search with Reinforcement Learning - url: https://arxiv.org/abs/1611.01578 - code: [] - - - model: 2-Layer Norm HyperLSTM - authors: Ha et al. - year: 2016 - Bits per Character (BPC): 1.219 - Number of params (M): 14.4 - paper: HyperNetworks - url: https://arxiv.org/abs/1609.09106 - code: [] diff --git a/_data/lexical_normalization_lexnorm.yaml b/_data/lexical_normalization_lexnorm.yaml deleted file mode 100644 index 7b6efce9..00000000 --- a/_data/lexical_normalization_lexnorm.yaml +++ /dev/null @@ -1,33 +0,0 @@ -- - model: MoNoise - authors: Rob van der Goot and Gertjan van Noord - year: 2017 - accuracy: 87.63 - paper: "MoNoise: Modeling Noise Using a Modular Normalization System." - url: http://www.let.rug.nl/rob/doc/clin27.paper.pdf - code: https://bitbucket.org/robvanderg/monoise/ -- - model: Joint POS + Norm in a Viterbi decoding* - authors: Chen Li and Yang Liu - year: 2015 - accuracy: 87.58* - paper: Joint POS Tagging and Text Normalization for Informal Text. - url: http://www.aaai.org/ocs/index.php/IJCAI/IJCAI15/paper/download/10839/10838 - code: [] -- - model: Syllable based - authors: Ke Xu, Yunqing Xia and Chin-Hui Lee - year: 2015 - accuracy: 86.08 - paper: Tweet Normalization with Syllables - url: http://www.aclweb.org/anthology/P15-1089 - code: [] -- - model: unLOL - authors: Yi Yang and Jacob Eisenstein - year: 2013 - accuracy: 82.06 - paper: A Log-Linear Model for Unsupervised Text Normalization. - url: http://www.aclweb.org/anthology/D13-1007 - code: [] - diff --git a/_data/lexical_normalization_lexnorm2015.yaml b/_data/lexical_normalization_lexnorm2015.yaml deleted file mode 100644 index bf181aa0..00000000 --- a/_data/lexical_normalization_lexnorm2015.yaml +++ /dev/null @@ -1,21 +0,0 @@ -- - model: MoNoise - authors: Rob van der Goot and Gertjan van Noord - year: 2017 - precision: 93.53 - recall: 80.26 - F1: 86.39 - paper: "MoNoise: Modeling Noise Using a Modular Normalization System." - url: http://www.let.rug.nl/rob/doc/clin27.paper.pdf - code: https://bitbucket.org/robvanderg/monoise/ -- - model: Random Forest + novel similarity metric - authors: Ning Jin - year: 2017 - precision: 90.61 - recall: 78.65 - F1: 84.21 - paper: "NCSU-SAS-Ning: Candidate Generation and Feature Engineering for Supervised Lexical Normalization" - url: http://www.aclweb.org/anthology/W15-4313 - code: [] - diff --git a/_data/relation_prediction.yaml b/_data/relation_prediction.yaml deleted file mode 100644 index 468d9e1d..00000000 --- a/_data/relation_prediction.yaml +++ /dev/null @@ -1,74 +0,0 @@ -WN18RR: -- - model: Max-Margin Markov Graph Models (M3GM) - authors: Pinter and Eisenstein - year: 2018 - H@10: 59.02 - H@1: 45.37 - MRR: 49.83 - paper: Predicting Semantic Relations using Global Graph Properties - url: https://arxiv.org/abs/1808.08644 - code: - - name: Official - url: http://www.github.com/yuvalpinter/m3gm -- - model: TransE (reimplementation) - authors: Pinter and Eisenstein - year: 2018 - H@10: 55.55 - H@1: 42.26 - MRR: 46.59 - paper: Predicting Semantic Relations using Global Graph Properties - url: https://arxiv.org/abs/1808.08644 - code: - - name: Reimplementation of Bordes et al. 2013. Translating embeddings for modeling multi-relational data. - url: http://www.github.com/yuvalpinter/m3gm -- - model: ConvKB - authors: Nguyen et al. - year: 2018 - H@10: 52.50 - H@1: N/A - MRR: 24.80 - paper: A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network - url: http://www.aclweb.org/anthology/N18-2053 - code: - - name: Official - url: https://github.com/daiquocnguyen/ConvKB -- - model: ConvE (v6) - authors: Dettmers et al. - year: 2018 - H@10: 52.00 - H@1: 40.00 - MRR: 43.00 - paper: Convolutional 2D Knowledge Graph Embeddings - url: https://arxiv.org/abs/1707.01476 - code: - - name: Official - url: https://github.com/TimDettmers/ConvE -- - model: ComplEx - authors: Trouillon et al. - year: 2016 - H@10: 51.00 - H@1: 41.00 - MRR: 44.00 - paper: Complex Embeddings for Simple Link Prediction - url: http://www.jmlr.org/proceedings/papers/v48/trouillon16.pdf - code: - - name: Official - url: https://github.com/ttrouill/complex -- - model: DistMult (reimplementation) - authors: Dettmers et al. - year: 2017 - H@10: 49.00 - H@1: 40.00 - MRR: 43.00 - paper: Convolutional 2D Knowledge Graph Embeddings - url: https://arxiv.org/abs/1412.6575 - code: - - name: Reimplementation of Yang et al. 2013. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. - url: https://github.com/uclmr/inferbeddings - diff --git a/chinese/chinese.md b/chinese/chinese.md new file mode 100644 index 00000000..91921f80 --- /dev/null +++ b/chinese/chinese.md @@ -0,0 +1,18 @@ +# Chinese NLP tasks + +## Entity linking + +See [here](../english/entity_linking.md) for more information about the task. + +### Datasets + +#### AIDA CoNLL-YAGO Dataset + +##### Disambiguation-Only Models + +| Model | Micro-Precision | Paper / Source | Code | +| ------------- | :-----:| :----: | :----: | --- | +| Sil et al. (2018) | 84.4 | [Neural Cross-Lingual Entity Linking](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16501/16101) | | +| Tsai & Roth (2016) | 83.6 | [Cross-lingual wikification using multilingual embeddings](http://cogcomp.org/papers/TsaiRo16b.pdf) | | + +[Go back to the README](../README.md) diff --git a/english/asr.md b/english/automatic_speech_recognition.md similarity index 100% rename from english/asr.md rename to english/automatic_speech_recognition.md diff --git a/english/ccg_supertagging.md b/english/ccg_supertagging.md index 1743c323..7989db23 100644 --- a/english/ccg_supertagging.md +++ b/english/ccg_supertagging.md @@ -17,8 +17,11 @@ The CCGBank is a corpus of CCG derivations and dependency structures extracted f section 00 for development, and section 23 as in-domain test set. Performance is only calculated on the 425 most frequent labels. Models are evaluated based on accuracy. -{% include table.html results=site.data.ccg_supertagging scores='accuracy' %} - -{% include chart.html results=site.data.ccg_supertagging score='accuracy' %} +| Model | Accuracy | Paper / Source | +| ------------- | :-----:| --- | +| Lewis et al. (2016) | 94.7 | [LSTM CCG Parsing](https://aclweb.org/anthology/N/N16/N16-1026.pdf) | +| Vaswani et al. (2016) | 94.24 | [Supertagging with LSTMs](https://aclweb.org/anthology/N/N16/N16-1027.pdf) | +| Low supervision (Søgaard and Goldberg, 2016) | 93.26 | [Deep multi-task learning with low level tasks supervised at lower layers](http://anthology.aclweb.org/P16-2038) | +| Xu et al. (2015) | 93.00 | [CCG Supertagging with a Recurrent Neural Network](http://www.aclweb.org/anthology/P15-2041) | [Go back to the README](../README.md) diff --git a/english/chunking.md b/english/chunking.md index 1bd7bc75..18bff328 100644 --- a/english/chunking.md +++ b/english/chunking.md @@ -14,8 +14,9 @@ The [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) is typically used fo Sections 15-18 are used for training, section 19 for development, and and section 20 for testing. Models are evaluated based on F1. -{% include table.html results=site.data.chunking scores='F1 score' %} - -{% include chart.html results=site.data.chunking score='F1 score' %} +| Model | F1 score | Paper / Source | +| ------------- | :-----:| --- | +| Low supervision (Søgaard and Goldberg, 2016) | 95.57 | [Deep multi-task learning with low level tasks supervised at lower layers](http://anthology.aclweb.org/P16-2038) | +| Suzuki and Isozaki (2008) | 95.15 | [Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data](https://aclanthology.info/pdf/P/P08/P08-1076.pdf) | [Go back to the README](../README.md) diff --git a/english/constituency_parsing.md b/english/constituency_parsing.md index 2512f308..46a05b5a 100644 --- a/english/constituency_parsing.md +++ b/english/constituency_parsing.md @@ -29,8 +29,16 @@ evaluating constituency parsers. Section 22 is used for development and Section Models are evaluated based on F1. Most of the below models incorporate external data or features. For a comparison of single models trained only on WSJ, refer to [Kitaev and Klein (2018)](https://arxiv.org/abs/1805.01052). -{% include table.html results=site.data.constituency_parsing scores='F1 score' %} - -{% include chart.html results=site.data.constituency_parsing score='F1 score' %} +| Model | F1 score | Paper / Source | +| ------------- | :-----:| --- | +| Self-attentive encoder + ELMo (Kitaev and Klein, 2018) | 95.13 | [Constituency Parsing with a Self-Attentive Encoder](https://arxiv.org/abs/1805.01052) | +| Model combination (Fried et al., 2017) | 94.66 | [Improving Neural Parsing by Disentangling Model Combination and Reranking Effects](https://arxiv.org/abs/1707.03058) | +| In-order (Liu and Zhang, 2017) | 94.2 | [In-Order Transition-based Constituent Parsing](http://aclweb.org/anthology/Q17-1029) | +| Semi-supervised LSTM-LM (Choe and Charniak, 2016) | 93.8 | [Parsing as Language Modeling](http://www.aclweb.org/anthology/D16-1257) | +| Stack-only RNNG (Kuncoro et al., 2017) | 93.6 | [What Do Recurrent Neural Network Grammars Learn About Syntax?](https://arxiv.org/abs/1611.05774) | +| RNN Grammar (Dyer et al., 2016) | 93.3 | [Recurrent Neural Network Grammars](https://www.aclweb.org/anthology/N16-1024) | +| Transformer (Vaswani et al., 2017) | 92.7 | [Attention Is All You Need](https://arxiv.org/abs/1706.03762) | +| Semi-supervised LSTM (Vinyals et al., 2015) | 92.1 | [Grammar as a Foreign Language](https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf) | +| Self-trained parser (McClosky et al., 2006) | 92.1 | [Effective Self-Training for Parsing](https://pdfs.semanticscholar.org/6f0f/64f0dab74295e5eb139c160ed79ff262558a.pdf) | [Go back to the README](../README.md) diff --git a/english/dependency_parsing.md b/english/dependency_parsing.md index 292ea300..dabde301 100644 --- a/english/dependency_parsing.md +++ b/english/dependency_parsing.md @@ -27,15 +27,24 @@ conversion (**v3.3.0**) of the Penn Treebank with __predicted__ POS-tags. Punctu are excluded from the evaluation. Evaluation metrics are unlabeled attachment score (UAS) and labeled attachment score (LAS). Here, we also mention the predicted POS tagging accuracy. -{% include table.html - results=site.data.dependency_parsing.Penn_Treebank - scores='POS,UAS,LAS' %} +| Model | POS | UAS | LAS | Paper / Source | Code | +| ------------- | :-----: | :-----:| :-----:| --- | --- | +| Deep Biaffine (Dozat and Manning, 2017) | 97.3 | 95.44 | 93.76 | [Deep Biaffine Attention for Neural Dependency Parsing](https://arxiv.org/abs/1611.01734) | [Official](https://github.com/tdozat/Parser-v1) | +| jPTDP (Nguyen and Verspoor, 2018) | 97.97 | 94.51 | 92.87 | [An improved neural network model for joint POS tagging and dependency parsing](https://arxiv.org/abs/1807.03955) | [Official](https://github.com/datquocnguyen/jPTDP) | +| Andor et al. (2016) | 97.44 | 94.61 | 92.79 | [Globally Normalized Transition-Based Neural Networks](https://www.aclweb.org/anthology/P16-1231) | | +| Distilled neural FOG (Kuncoro et al., 2016) | 97.3 | 94.26 | 92.06 | [Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser](https://arxiv.org/abs/1609.07561) | | +| Weiss et al. (2015) | 97.44 | 93.99 | 92.05 | [Structured Training for Neural Network Transition-Based Parsing](http://anthology.aclweb.org/P/P15/P15-1032.pdf) | | +| BIST transition-based parser (Kiperwasser and Goldberg, 2016) | 97.3 | 93.9 | 91.9 | [Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations](https://aclweb.org/anthology/Q16-1023) | [Official](https://github.com/elikip/bist-parser/tree/master/barchybrid/src) | +| Arc-hybrid (Ballesteros et al., 2016) | 97.3 | 93.56 | 91.42 | [Training with Exploration Improves a Greedy Stack-LSTM Parser](https://arxiv.org/abs/1603.03793) | | +| BIST graph-based parser (Kiperwasser and Goldberg, 2016) | 97.3 | 93.1 | 91.0 | [Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations](https://aclweb.org/anthology/Q16-1023) | [Official](https://github.com/elikip/bist-parser/tree/master/bmstparser/src) | The following results are just for references: -{% include table.html - results=site.data.dependency_parsing.Reference - scores='POS,UAS,comment' %} +| Model | UAS | LAS | Note | Paper / Source | +| ------------- | :-----:| :-----:| --- | --- | +| Stack-only RNNG (Kuncoro et al., 2017) | 95.8 | 94.6 | Constituent parser |[What Do Recurrent Neural Network Grammars Learn About Syntax?](https://arxiv.org/abs/1611.05774) | +| Semi-supervised LSTM-LM (Choe and Charniak, 2016) (Constituent parser) | 95.9 | 94.1 | Constituent parser |[Parsing as Language Modeling](http://www.aclweb.org/anthology/D16-1257) | +| Deep Biaffine (Dozat and Manning, 2017) | 95.66 | 94.03 | Stanford conversion **v3.5.0** | [Deep Biaffine Attention for Neural Dependency Parsing](https://arxiv.org/abs/1611.01734) | # Unsupervised dependency parsing @@ -46,9 +55,13 @@ Unsupervised dependency parsing is the task of inferring the dependency parse of As with supervised parsing, models are evaluated against the Penn Treebank. The most common evaluation setup is to use gold POS-tags as input and to evaluate systems using the unlabeled attachment score (also called 'directed dependency accuracy'). - -{% include table.html - results=site.data.dependency_parsing.Unsupervised_Penn_Treebank - scores='UAS' %} + +| Model | UAS | Paper / Source | +| ------------- | :-----:| ---- | +| Iterative reranking (Le & Zuidema, 2015) | 66.2 | [Unsupervised Dependency Parsing - Let’s Use Supervised Parsers](http://www.aclweb.org/anthology/N15-1067) | +| Combined System (Spitkovsky et al., 2013) | 64.4 | [Breaking Out of Local Optima with Count Transforms and Model Recombination - A Study in Grammar Induction](http://www.aclweb.org/anthology/D13-1204) | +| Tree Substitution Grammar DMV (Blunsom & Cohn, 2010) | 55.7 | [Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing](http://www.aclweb.org/anthology/D10-1117) | +| Shared Logistic Normal DMV (Cohen & Smith, 2009) | 41.4 | [Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction](http://www.aclweb.org/anthology/N09-1009) | +| DMV (Klein & Manning, 2004) | 35.9 | [Corpus-Based Induction of Syntactic Structure - Models of Dependency and Constituency](http://www.aclweb.org/anthology/P04-1061) | [Go back to the README](../README.md) diff --git a/english/dialog.md b/english/dialog.md deleted file mode 100644 index 88e037a2..00000000 --- a/english/dialog.md +++ /dev/null @@ -1,21 +0,0 @@ -# Dialog - -Dialogue is notoriously hard to evaluate. Past approaches have used human evaluation. - -## Dialog state tracking - -Dialogue state tacking consists of determining at each turn of a dialog the -full representation of what the user wants at that point in the dialog, -which contains a goal constraint, a set of requested slots, and the user's dialog act. - -### Second dialog state tracking challenge - -For goal-oriented dialogue, the dataset of the [second dialog state tracking challenge](http://www.aclweb.org/anthology/W14-4337) -(DSTC2) is a common evaluation dataset. The DSTC2 focuses on the restaurant search domain. Models are -evaluated based on accuracy on both individual and joint slot tracking. - -{% include table.html results=site.data.dialog scores='Area,Food,Price,Joint' %} - -{% include chart.html results=site.data.dialog score='Joint' %} - -[Go back to the README](../README.md) diff --git a/english/dialogue.md b/english/dialogue.md new file mode 100644 index 00000000..4bc01945 --- /dev/null +++ b/english/dialogue.md @@ -0,0 +1,23 @@ +# Dialogue + +Dialogue is notoriously hard to evaluate. Past approaches have used human evaluation. + +## Dialogue state tracking + +Dialogue state tacking consists of determining at each turn of a dialogue the +full representation of what the user wants at that point in the dialogue, +which contains a goal constraint, a set of requested slots, and the user's dialogue act. + +### Second dialogue state tracking challenge + +For goal-oriented dialogue, the dataset of the [second dialogue state tracking challenge](http://www.aclweb.org/anthology/W14-4337) +(DSTC2) is a common evaluation dataset. The DSTC2 focuses on the restaurant search domain. Models are +evaluated based on accuracy on both individual and joint slot tracking. + +| Model | Area | Food | Price | Joint | Paper / Source | +| ------------- | :-----:| :-----:| :-----:| :-----:| --- | +| Liu et al. (2018) | 90 | 84 | 92 | 72 | [Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems](https://arxiv.org/abs/1804.06512) | +| Neural belief tracker (Mrkšić et al., 2017) | 90 | 84 | 94 | 72 | [Neural Belief Tracker: Data-Driven Dialogue State Tracking](https://arxiv.org/abs/1606.03777) | +| RNN (Henderson et al., 2014) |92 | 86 | 86 | 69 | [Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised gate](http://svr-ftp.eng.cam.ac.uk/~sjy/papers/htyo14.pdf) | + +[Go back to the README](../README.md) diff --git a/english/domain_adaptation.md b/english/domain_adaptation.md index 3b03a9cb..b1e51cbe 100644 --- a/english/domain_adaptation.md +++ b/english/domain_adaptation.md @@ -12,8 +12,11 @@ typically evaluated on a target domain that is different from the source domain having access to unlabeled examples of the target domain (unsupervised domain adaptation). The evaluation metric is accuracy and scores are averaged across each domain. -{% include table.html - results=site.data.domain_adaptation - scores='DVD,Books,Electronics,Kitchen,Average' %} +| Model | DVD | Books | Electronics | Kitchen | Average | Paper / Source | +| ------------- | :-----:| :-----:| :-----:| :-----:| :-----:| --- | +| Multi-task tri-training (Ruder and Plank, 2018) | 78.14 | 74.86 | 81.45 | 82.14 | 79.15 | [Strong Baselines for Neural Semi-supervised Learning under Domain Shift](https://arxiv.org/abs/1804.09530) | +| Asymmetric tri-training (Saito et al., 2017) | 76.17 | 72.97 | 80.47 | 83.97 | 78.39 | [Asymmetric Tri-training for Unsupervised Domain Adaptation](https://arxiv.org/abs/1702.08400) | +| VFAE (Louizos et al., 2015) | 76.57 | 73.40 | 80.53 | 82.93 | 78.36 | [The Variational Fair Autoencoder](https://arxiv.org/abs/1511.00830) | +| DANN (Ganin et al., 2016) | 75.40 | 71.43 | 77.67 | 80.53 | 76.26 | [Domain-Adversarial Training of Neural Networks](https://arxiv.org/abs/1505.07818) | [Go back to the README](../README.md) diff --git a/english/entity_linking.md b/english/entity_linking.md index a283fc40..9bb9b640 100644 --- a/english/entity_linking.md +++ b/english/entity_linking.md @@ -40,24 +40,22 @@ More in details can be found in this [survey](http://dbgroup.cs.tsinghua.edu.cn/ The [AIDA CoNLL-YAGO][AIDACoNLLYAGO] Dataset by [[Hoffart]](http://www.aclweb.org/anthology/D11-1072) contains assignments of entities to the mentions of named entities annotated for the original [[CoNLL]](http://www.aclweb.org/anthology/W03-0419.pdf) 2003 NER task. The entities are identified by [YAGO2](http://yago-knowledge.org/) entity identifier, by [Wikipedia URL](https://en.wikipedia.org/), or by [Freebase mid](http://wiki.freebase.com/wiki/Machine_ID). ##### Disambiguation-Only Models -{% include table.html - results=site.data.entity_linking_disambiguation_only - scores='Micro-Precision,Macro-Precision' %} -##### Disambiguation-Only Models (Spanish) -{% include table.html - results=site.data.entity_linking_disambiguation_only_spanish - scores='Micro-Precision' %} - -##### Disambiguation-Only Models (Chinese) -{% include table.html - results=site.data.entity_linking_disambiguation_only_chinese - scores='Micro-Precision' %} +| Paper / Source | Micro-Precision | Macro-Precision | Paper / Source | Code | +| ------------- | :-----:| :----: | :----: | --- | +| Sil et al. (2018) | 94.0 | - | [Neural Cross-Lingual Entity Linking](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16501/16101) | | +| Radhakrishnan et al. (2018) | 93.0 | 93.7 | [ELDEN: Improved Entity Linking using Densified Knowledge Graphs](http://aclweb.org/anthology/N18-1167) | | +| Le et al. (2018) | 93.07 | - | [Improving Entity Linking by Modeling Latent Relations between Mentions](http://aclweb.org/anthology/P18-1148) | +| Ganea and Hofmann (2017) | 92.22 | - | [Deep Joint Entity Disambiguation with Local Neural Attention](https://www.aclweb.org/anthology/D17-1277) | [Link](https://github.com/dalab/deep-ed) | +| Hoffart et al. (2011) | 82.29 | 82.02 | [Robust Disambiguation of Named Entities in Text](http://www.aclweb.org/anthology/D11-1072) | | ##### End-to-End Models -{% include table.html - results=site.data.entity_linking_end_to_end - scores='Micro-F1-strong,Macro-F1-strong' %} + +| Paper / Source | Micro-F1-strong | Macro-F1-strong | Paper / Source | Code | +| ------------- | :-----:| :----: | :----: | --- | +| Kolitsas et al. (2018) | 86.6 | 89.4 | [End-to-End Neural Entity Linking](https://arxiv.org/pdf/1808.07699.pdf) | [Official](https://github.com/dalab/end2end_neural_el) | +| Piccinno et al. (2014) | 69.32 | 72.8 | [From TagME to WAT: a new entity annotator](https://dl.acm.org/citation.cfm?id=2634350) | | +| Hoffart et al. (2011) | 68.8 | 72.4 | [Robust Disambiguation of Named Entities in Text](http://www.aclweb.org/anthology/D11-1072) | | ### Platforms diff --git a/english/grammatical_error_correction.md b/english/grammatical_error_correction.md index b9479d28..7759b984 100644 --- a/english/grammatical_error_correction.md +++ b/english/grammatical_error_correction.md @@ -12,25 +12,30 @@ Grammatical Error Correction (GEC) is the task of correcting grammatical mistake CoNLL-14 benchmark is done on the [test split](https://www.comp.nus.edu.sg/~nlp/conll14st/conll14st-test-data.tar.gz) of [NUS Corpus of Learner English/NUCLE](https://www.comp.nus.edu.sg/~nlp/corpora.html) dataset. CoNLL-2014 test set contains 1,312 english sentences with grammatical error correction annotations by 2 annotators. Models are evaluated with [F-score](https://en.wikipedia.org/wiki/F1_score) with β=0.5 which weighs precision twice as recall. - -{% include table.html results=site.data.grammatical_error_correction.CoNLL_2014 scores='F0.5' %} - -{% include chart.html results=site.data.grammatical_error_correction.CoNLL_2014 score='F0.5' %} - +| Model | F0.5 | Paper / Source | Code | +| ------------- | :-----:| --- | :-----: | +| CNN Seq2Seq + Fluency Boost (Ge et al., 2018) | 61.34 | [Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study](https://arxiv.org/abs/1807.01270)| NA | +| SMT + BiGRU (Grundkiewicz et al., 2018) | 56.25 | [Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation](https://arxiv.org/abs/1804.05945)| NA | +| Transformer (Junczys-Dowmunt et al., 2018) | 55.8 | [Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task](http://aclweb.org/anthology/N18-1055)| NA | +| CNN Seq2Seq (Chollampatt & Ng, 2018)| 54.79 | [ A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction](https://arxiv.org/abs/1801.08831)| [Official](https://github.com/nusnlp/mlconvgec2018) | ### CoNLL-2014 10 Annotators [Bryant and Ng 2015](https://pdfs.semanticscholar.org/f76f/fd242c3dc88e52d1f427cdd0f5dccd814937.pdf) used 10 annotators to do grammatical error correction on CoNll-14's [1312 sentences](http://www.comp.nus.edu.sg/~nlp/sw/10gec_annotations.zip). -{% include table.html results=site.data.grammatical_error_correction.CoNLL_2014_10_Annotators scores='F0.5' %} - -{% include chart.html results=site.data.grammatical_error_correction.CoNLL_2014_10_Annotators score='F0.5' %} - +| Model | F0.5 | Paper / Source | Code | +| ------------- | :-----:| --- | :-----: | +| CNN Seq2Seq + Fluency Boost (Ge et al., 2018) | 76.88 | [Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study](https://arxiv.org/abs/1807.01270)| NA | +| SMT + BiGRU (Grundkiewicz et al., 2018) | 72.04 | [Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation](https://arxiv.org/abs/1804.05945)| NA | +| CNN Seq2Seq (Chollampatt & Ng, 2018)| 70.14 (measured by Ge et al., 2018) | [ A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction](https://arxiv.org/abs/1801.08831)| [Official](https://github.com/nusnlp/mlconvgec2018) | ### JFLEG [JFLEG corpus](https://github.com/keisks/jfleg) by [Napoles et al., 2017](https://arxiv.org/abs/1702.04066) consists of 1,511 english sentences with annotations. Models are evaluated with [GLEU metric](https://arxiv.org/abs/1609.08144). -{% include table.html results=site.data.grammatical_error_correction.JFLEG scores='GLEU' %} - -{% include chart.html results=site.data.grammatical_error_correction.JFLEG score='GLEU' %} +| Model | GLEU | Paper / Source | Code | +| ------------- | :-----:| --- | :-----: | +| CNN Seq2Seq + Fluency Boost and inference (Ge et al., 2018) | 62.37 | [Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study](https://arxiv.org/abs/1807.01270)| NA | +| SMT + BiGRU (Grundkiewicz et al., 2018) | 61.50 | [Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation](https://arxiv.org/abs/1804.05945)| NA | +| Transformer (Junczys-Dowmunt et al., 2018) | 59.9 | [Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task](http://aclweb.org/anthology/N18-1055)| NA | +| CNN Seq2Seq (Chollampatt & Ng, 2018)| 57.47 | [ A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction](https://arxiv.org/abs/1801.08831)| [Official](https://github.com/nusnlp/mlconvgec2018) | diff --git a/english/language_modeling.md b/english/language_modeling.md index e598b293..4a0fe756 100644 --- a/english/language_modeling.md +++ b/english/language_modeling.md @@ -18,9 +18,13 @@ the most frequent 10k words with the rest of the tokens replaced by an `` t Models are evaluated based on perplexity, which is the average per-word log-probability (lower is better). -{% include table.html - results=site.data.language_modeling.Word_Level.Penn_Treebank - scores='Validation perplexity,Test perplexity' %} +| Model | Validation perplexity | Test perplexity | Paper / Source | +| ------------- | :-----:| :-----:| --- | +| AWD-LSTM-MoS + dynamic eval (Yang et al., 2018)* | 48.33 | 47.69 | [Breaking the Softmax Bottleneck: A High-Rank RNN Language Model](https://arxiv.org/abs/1711.03953) | +| AWD-LSTM + dynamic eval (Krause et al., 2017)* | 51.6 | 51.1 | [Dynamic Evaluation of Neural Sequence Models](https://arxiv.org/abs/1709.07432) | +| AWD-LSTM + continuous cache pointer (Merity et al., 2017)* | 53.9 | 52.8 | [Regularizing and Optimizing LSTM Language Models](https://arxiv.org/abs/1708.02182) | +| AWD-LSTM-MoS (Yang et al., 2018) | 56.54 | 54.44 | [Breaking the Softmax Bottleneck: A High-Rank RNN Language Model](https://arxiv.org/abs/1711.03953) | +| AWD-LSTM (Merity et al., 2017) | 60.0 | 57.3 | [Regularizing and Optimizing LSTM Language Models](https://arxiv.org/abs/1708.02182) | ### WikiText-2 @@ -28,9 +32,13 @@ per-word log-probability (lower is better). benchmark for language modeling than the pre-processed Penn Treebank. WikiText-2 consists of around 2 million words extracted from Wikipedia articles. -{% include table.html - results=site.data.language_modeling.Word_Level.WikiText_2 - scores='Validation perplexity,Test perplexity' %} +| Model | Validation perplexity | Test perplexity | Paper / Source | +| ------------- | :-----:| :-----:| --- | +| AWD-LSTM-MoS + dynamic eval (Yang et al., 2018)* | 42.41 | 40.68 | [Breaking the Softmax Bottleneck: A High-Rank RNN Language Model](https://arxiv.org/abs/1711.03953) | +| AWD-LSTM + dynamic eval (Krause et al., 2017)* | 46.4 | 44.3 | [Dynamic Evaluation of Neural Sequence Models](https://arxiv.org/abs/1709.07432) | +| AWD-LSTM + continuous cache pointer (Merity et al., 2017)* | 53.8 | 52.0 | [Regularizing and Optimizing LSTM Language Models](https://arxiv.org/abs/1708.02182) | +| AWD-LSTM-MoS (Yang et al., 2018) | 63.88 | 61.45 | [Breaking the Softmax Bottleneck: A High-Rank RNN Language Model](https://arxiv.org/abs/1711.03953) | +| AWD-LSTM (Merity et al., 2017) | 68.6 | 65.8 | [Regularizing and Optimizing LSTM Language Models](https://arxiv.org/abs/1708.02182) | ### WikiText-103 @@ -39,6 +47,15 @@ consists of around 2 million words extracted from Wikipedia articles. {% include table.html results=site.data.language_modeling.Word_Level.WikiText_103 scores='Validation perplexity,Test perplexity' %} + +| Model | Validation perplexity | Test perplexity | Paper / Source | Code | +| ------------- | :-----:| :-----:| --- | --- | +| LSTM + Hebbian + Cache + MbPA (Rae et al., 2018) | 29.0 | 29.2 | [Fast Parametric Learning with Activation Memorization](http://arxiv.org/abs/1803.10049) | | +| LSTM + Hebbian (Rae et al., 2018) | 34.1 | 34.3 | [Fast Parametric Learning with Activation Memorization](http://arxiv.org/abs/1803.10049) | | +| LSTM (Rae et al., 2018) | 36.0 | 36.4 | [Fast Parametric Learning with Activation Memorization](http://arxiv.org/abs/1803.10049) | | +| Gated CNN (Dauphin et al., 2016) | - | 37.2 | [Language modeling with gated convolutional networks](https://arxiv.org/abs/1612.08083) | | +| Temporal CNN (Bai et al., 2018) | - | 45.2 | [Convolutional sequence modeling revisited](https://openreview.net/forum?id=BJEX-H1Pf) | | +| LSTM (Graves et al., 2014) | - | 48.7 | [Neural turing machines](https://arxiv.org/abs/1410.5401) | | ## Character Level Models @@ -48,22 +65,37 @@ consists of around 2 million words extracted from Wikipedia articles. first 100 million bytes of a Wikipedia XML dump. For simplicity we shall refer to it as a character-level dataset. Within these 100 million bytes are 205 unique tokens. -{% include table.html - results=site.data.language_modeling.Char_Level.Hutter_Prize - scores='Bits per Character (BPC),Number of params (M)' %} +| Model | Bit per Character (BPC) | Number of params | Paper / Source | +| ---------------- | :-----: | :-----: | --- | +| mLSTM + dynamic eval (Krause et al., 2017)* | 1.08 | 46M | [Dynamic Evaluation of Neural Sequence Models](https://arxiv.org/abs/1709.07432) +| 3 layer AWD-LSTM (Merity et al., 2018) | 1.232 | 47M | [An Analysis of Neural Language Modeling at Multiple Scales](https://arxiv.org/abs/1803.08240) | +| Large FS-LSTM-4 (Mujika et al., 2017) | 1.245 | 47M | [Fast-Slow Recurrent Neural Networks](https://arxiv.org/abs/1705.08639) | +| Large mLSTM +emb +WN +VD (Krause et al., 2017) | 1.24 | 46M | [Multiplicative LSTM for sequence modelling](https://arxiv.org/abs/1609.07959) +| FS-LSTM-4 (Mujika et al., 2017) | 1.277 | 27M | [Fast-Slow Recurrent Neural Networks](https://arxiv.org/abs/1705.08639) | +| Large RHN (Zilly et al., 2016) | 1.27 | 46M | [Recurrent Highway Networks](https://arxiv.org/abs/1607.03474) ### Text8 [The text8 dataset](http://mattmahoney.net/dc/textdata.html) is also derived from Wikipedia text, but has all XML removed, and is lower cased to only have 26 characters of English text plus spaces. -{% include table.html - results=site.data.language_modeling.Char_Level.Text8 - scores='Bits per Character (BPC),Number of params (M)' %} +| Model | Bit per Character (BPC) | Number of params | Paper / Source | +| ---------------- | :-----: | :-----: | --- | +| mLSTM + dynamic eval (Krause et al., 2017)* | 1.19 | 45M | [Dynamic Evaluation of Neural Sequence Models](https://arxiv.org/abs/1709.07432) +| Large mLSTM +emb +WN +VD (Krause et al., 2016) | 1.27 | 45M | [Multiplicative LSTM for sequence modelling](https://arxiv.org/abs/1609.07959) +| Large RHN (Zilly et al., 2016) | 1.27 | 46M | [Recurrent Highway Networks](https://arxiv.org/abs/1607.03474) +| LayerNorm HM-LSTM (Chung et al., 2017) | 1.29 | 35M | [Hierarchical Multiscale Recurrent Neural Networks](https://arxiv.org/abs/1609.01704) +| BN LSTM (Cooijmans et al., 2016) | 1.36 | 16M | [Recurrent Batch Normalization](https://arxiv.org/abs/1603.09025) +| Unregularised mLSTM (Krause et al., 2016) | 1.40 | 45M | [Multiplicative LSTM for sequence modelling](https://arxiv.org/abs/1609.07959) ### Penn Treebank The vocabulary of the words in the character-level dataset is limited to 10 000 - the same vocabulary as used in the word level dataset. This vastly simplifies the task of character-level language modeling as character transitions will be limited to those found within the limited word level vocabulary. -{% include table.html - results=site.data.language_modeling.Char_Level.Penn_Treebank - scores='Bits per Character (BPC),Number of params (M)' %} +| Model | Bit per Character (BPC) | Number of params | Paper / Source | +| ---------------- | :-----: | :-----: | --- | +| 3 layer AWD-LSTM (Merity et al., 2018) | 1.175 | 13.8M | [An Analysis of Neural Language Modeling at Multiple Scales](https://arxiv.org/abs/1803.08240) | +| 6 layer QRNN (Merity et al., 2018) | 1.187 | 13.8M | [An Analysis of Neural Language Modeling at Multiple Scales](https://arxiv.org/abs/1803.08240) | +| FS-LSTM-4 (Mujika et al., 2017) | 1.190 | 27M | [Fast-Slow Recurrent Neural Networks](https://arxiv.org/abs/1705.08639) | +| FS-LSTM-2 (Mujika et al., 2017) | 1.193 | 27M | [Fast-Slow Recurrent Neural Networks](https://arxiv.org/abs/1705.08639) | +| NASCell (Zoph & Le, 2016) | 1.214 | 16.3M | [Neural Architecture Search with Reinforcement Learning](https://arxiv.org/abs/1611.01578) +| 2-Layer Norm HyperLSTM (Ha et al., 2016) | 1.219 | 14.4M | [HyperNetworks](https://arxiv.org/abs/1609.09106) [Go back to the README](../README.md) diff --git a/english/lexical_normalization.md b/english/lexical_normalization.md index 0a6a6e6d..4e41458e 100644 --- a/english/lexical_normalization.md +++ b/english/lexical_normalization.md @@ -26,8 +26,12 @@ used as training data, because of its similar annotation style. This dataset is commonly evaluated with accuracy on the non-standard words. This means that the system knows in advance which words are in need of normalization. - -{% include table.html results=site.data.lexical_normalization_lexnorm scores='accuracy' %} +| Model | Accuracy | Paper / Source | Code | +| ------------- | :-----:| --- | --- | +| MoNoise (van der Goot & van Noord, 2017) | 87.63 | [MoNoise: Modeling Noise Using a Modular Normalization System](http://www.let.rug.nl/rob/doc/clin27.paper.pdf) | [Official](https://bitbucket.org/robvanderg/monoise/) | +| Joint POS + Norm in a Viterbi decoding (Li & Liu, 2015) | 87.58* | [Joint POS Tagging and Text Normalization for Informal Text](http://www.aaai.org/ocs/index.php/IJCAI/IJCAI15/paper/download/10839/10838) | | +| Syllable based (Xu et al., 2015) | 86.08 | [Tweet Normalization with Syllables](http://www.aclweb.org/anthology/P15-1089) | | +| unLOL (Yang & Eisenstein, 2013) | 82.06 | [A Log-Linear Model for Unsupervised Text Normalization](http://www.aclweb.org/anthology/D13-1007) | | \* used a slightly different version of the data @@ -48,7 +52,9 @@ Recall: out of all normalization by system, how many correct This means that if the system replaces a word which is in need of normalization, but chooses the wrong normalization, it is penalized twice. -{% include table.html results=site.data.lexical_normalization_lexnorm2015 scores='F1' %} +| Model | F1 | Precision | Recall | Paper / Source | Code | +| ------------- | :-----:| :-----:| :-----:| --- | --- | +| MoNoise (van der Goot & van Noord, 2017) | 86.39 | 93.53 | 80.26 | [MoNoise: Modeling Noise Using a Modular Normalization System](http://www.let.rug.nl/rob/doc/clin27.paper.pdf) | [Official](https://bitbucket.org/robvanderg/monoise/) | +| Random Forest + novel similarity metric (Jin, 2017) | 84.21 | 90.61 | 78.65 | [NCSU-SAS-Ning: Candidate Generation and Feature Engineering for Supervised Lexical Normalization](http://www.aclweb.org/anthology/W15-4313) | | [Go back to the README](../README.md) - diff --git a/english/natural_language_inference.md b/english/natural_language_inference.md index 19e28cd3..c0bd1032 100644 --- a/english/natural_language_inference.md +++ b/english/natural_language_inference.md @@ -29,10 +29,10 @@ Public leaderboards for [in-genre (matched)](https://www.kaggle.com/c/multinli-m and [cross-genre (mismatched)](https://www.kaggle.com/c/multinli-mismatched-open-evaluation/leaderboard) evaluation are available, but entries do not correspond to published models. -| Model | Matched | Mismatched | Paper / Source | -| ------------- | :-----:| :-----:| --- | -| Finetuned Transformer LM (Radford et al., 2018) | 82.1 | 81.4 | [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) | -| Multi-task BiLSTM + Attn (Wang et al., 2018) | 72.2 | 72.1 | [GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding](https://arxiv.org/abs/1804.07461) | +| Model | Matched | Mismatched | Paper / Source | Code | +| ------------- | :-----:| :-----:| --- | --- | +| Finetuned Transformer LM (Radford et al., 2018) | 82.1 | 81.4 | [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) | | +| Multi-task BiLSTM + Attn (Wang et al., 2018) | 72.2 | 72.1 | [GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding](https://arxiv.org/abs/1804.07461) | | | GenSen (Subramanian et al., 2018) | 71.4 | 71.3 | [Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning](https://arxiv.org/abs/1804.00079) | | ### SciTail diff --git a/english/part-of-speech_tagging.md b/english/part-of-speech_tagging.md index 1563ea00..01874b96 100644 --- a/english/part-of-speech_tagging.md +++ b/english/part-of-speech_tagging.md @@ -16,17 +16,17 @@ A standard dataset for POS tagging is the Wall Street Journal (WSJ) portion of t different POS tags. Sections 0-18 are used for training, sections 19-21 for development, and sections 22-24 for testing. Models are evaluated based on accuracy. -| Model | Accuracy | Paper / Source | -| ------------- | :-----:| --- | -| Meta BiLSTM (Bohnet et al., 2018) | 97.96 | [Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings](https://arxiv.org/abs/1805.08237) | -| Char Bi-LSTM (Ling et al., 2015) | 97.78 | [Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation](https://www.aclweb.org/anthology/D/D15/D15-1176.pdf) | -| Adversarial Bi-LSTM (Yasunaga et al., 2018) | 97.59 | [Robust Multilingual Part-of-Speech Tagging via Adversarial Training](https://arxiv.org/abs/1711.04903) | -| Yang et al. (2017) | 97.55 | [Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks](https://arxiv.org/abs/1703.06345) | -| Ma and Hovy (2016) | 97.55 | [End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF](https://arxiv.org/abs/1603.01354) | -| LM-LSTM-CRF (Liu et al., 2018)| 97.53 | [Empowering Character-aware Sequence Labeling with Task-Aware Neural Language Model](https://arxiv.org/pdf/1709.04109.pdf) | -| Feed Forward (Vaswani et a. 2016) | 97.4 | [Supertagging with LSTMs](https://aclweb.org/anthology/N/N16/N16-1027.pdf) | +| Model | Accuracy | Paper / Source | Code | +| ------------- | :-----:| --- | --- | +| Meta BiLSTM (Bohnet et al., 2018) | 97.96 | [Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings](https://arxiv.org/abs/1805.08237) | | +| Char Bi-LSTM (Ling et al., 2015) | 97.78 | [Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation](https://www.aclweb.org/anthology/D/D15/D15-1176.pdf) | | +| Adversarial Bi-LSTM (Yasunaga et al., 2018) | 97.59 | [Robust Multilingual Part-of-Speech Tagging via Adversarial Training](https://arxiv.org/abs/1711.04903) | | +| Yang et al. (2017) | 97.55 | [Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks](https://arxiv.org/abs/1703.06345) | | +| Ma and Hovy (2016) | 97.55 | [End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF](https://arxiv.org/abs/1603.01354) | | +| LM-LSTM-CRF (Liu et al., 2018)| 97.53 | [Empowering Character-aware Sequence Labeling with Task-Aware Neural Language Model](https://arxiv.org/pdf/1709.04109.pdf) | | +| Feed Forward (Vaswani et a. 2016) | 97.4 | [Supertagging with LSTMs](https://aclweb.org/anthology/N/N16/N16-1027.pdf) | | | Bi-LSTM (Ling et al., 2017) | 97.36 | [Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation](https://www.aclweb.org/anthology/D/D15/D15-1176.pdf) | | -| Bi-LSTM (Plank et al., 2016) | 97.22 | [Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss](https://arxiv.org/abs/1604.05529) | +| Bi-LSTM (Plank et al., 2016) | 97.22 | [Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss](https://arxiv.org/abs/1604.05529) | | ### Social media diff --git a/english/relation_prediction.md b/english/relation_prediction.md index 569ebd6c..6c1bccf8 100644 --- a/english/relation_prediction.md +++ b/english/relation_prediction.md @@ -35,9 +35,14 @@ The WN18 dataset was introduced in [Bordes et al., 2013](http://papers.nips.cc/p As a way to overcome this problem, [Dettmers et al. (2018)](https://arxiv.org/abs/1707.01476) introduced the [WN18RR](https://github.com/villmow/datasets_knowledge_embedding) dataset, derived from WN18, which features 11 relations only, no pair of which is reciprocal (but still include four internally-symmetric relations like *verb_group*, allowing the rule-based system to reach 35 on all three metrics). The test set is composed of triplets, each used to create two test instances, one for each entity to be predicted. Since each instance is associated with a single true entity, the maximum value for all metrics is 1.00. - -{% include table.html - results=site.data.relation_prediction.WN18RR - scores='H@10,H@1,MRR' %} + +| Model | H@10 | H@1 | MRR | Paper / Source | Code | +| ------------- | :-----:| :-----:| :-----:| --- | --- | +| Max-Margin Markov Graph Models (Pinter & Eisenstein, 2018) | 59.02 | 45.37 | 49.83 | [Predicting Semantic Relations using Global Graph Properties](https://arxiv.org/abs/1808.08644) | [Official](http://www.github.com/yuvalpinter/m3gm) | +| TransE (reimplementation by Pinter & Eisenstein, 2018) | 55.55 | 42.26 | 46.59 | [Predicting Semantic Relations using Global Graph Properties](https://arxiv.org/abs/1808.08644) | [Official](http://www.github.com/yuvalpinter/m3gm) | +| ConvKB (Nguyen et al., 2018) | 52.50 | - | 24.80 | [A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network](http://www.aclweb.org/anthology/N18-2053) | [Official](https://github.com/daiquocnguyen/ConvKB) | +| ConvE (v6; Dettmers et al., 2018) | 52.00 | 40.00 | 43.00 | [Convolutional 2D Knowledge Graph Embeddings](https://arxiv.org/abs/1707.01476) | [Official](https://github.com/TimDettmers/ConvE) | +| ComplEx (Trouillon et al., 2016) | 51.00 | 41.00 | 44.00 | [Complex Embeddings for Simple Link Prediction](http://www.jmlr.org/proceedings/papers/v48/trouillon16.pdf) | [Official](https://github.com/ttrouill/complex) | +| DistMult (reimplementation by Dettmers et al., 2017) | 49.00 | 40.00 | 43.00 | [Convolutional 2D Knowledge Graph Embeddings](https://arxiv.org/abs/1412.6575) | [Link](https://github.com/uclmr/inferbeddings) | [Back to README](../README.md) diff --git a/english/semantic_parsing.md b/english/semantic_parsing.md index 64e8d89f..264f40d5 100644 --- a/english/semantic_parsing.md +++ b/english/semantic_parsing.md @@ -1,5 +1,20 @@ # Semantic parsing +### Table of contents + +- [AMR parsing](#amr-parsing) + - [LDC2014T12](#ldc2014t12) + - [LDC2015E86](#ldc2015e86) + - [LDC2016E25](#ldc2016e25) +- [SQL parsing](#sql-parsing) + - [ATIS](#atis) + - [Advising](#advising) + - [GeoQuery](#geoquery) + - [Scholar](#scholar) + - [Spider](#spider) + - [WikiSQL](#wikisql) + - [Smaller datasets](#smaller-datasets) + Semantic parsing is the task of translating natural language into a formal meaning representation on which a machine can act. Representations may be an executable language such as SQL or more abstract representations such as [Abstract Meaning Representation (AMR)](https://en.wikipedia.org/wiki/Abstract_Meaning_Representation). @@ -71,6 +86,22 @@ Example: | Iyer et al., (2017) | 45 | 17 | [Learning a neural semantic parser from user feedback](http://www.aclweb.org/anthology/P17-1089) | [System](https://github.com/sriniiyer/nl2sql) | | Template Baseline (Finegan-Dollak et al., 2018) | 45 | 0 | [Improving Text-to-SQL Evaluation Methodology](http://arxiv.org/abs/1806.09029) | [Data and System](https://github.com/jkkummerfeld/text2sql-data) | +### Advising + +4,570 user questions about university course advising, with manually annotated SQL [Finegan-Dollak et al., (2018)](http://arxiv.org/abs/1806.09029). + +Example: + +| Question | SQL query | +| ------------- | --- | +| Can undergrads take 550 ? | `SELECT DISTINCT COURSEalias0.ADVISORY_REQUIREMENT , COURSEalias0.ENFORCED_REQUIREMENT , COURSEalias0.NAME FROM COURSE AS COURSEalias0 WHERE COURSEalias0.DEPARTMENT = \"department0\" AND COURSEalias0.NUMBER = 550 ;` | + +| Model | Question Split | Query Split | Paper / Source | Code | +| --------------- | ----- | :-----:| --------------- | ---- | +| Template Baseline (Finegan-Dollak et al., 2018) | 80 | 0 | [Improving Text-to-SQL Evaluation Methodology](http://arxiv.org/abs/1806.09029) | [Data and System](https://github.com/jkkummerfeld/text2sql-data) | +| Seq2Seq with copying (Finegan-Dollak et al., 2018) | 70 | 0 | [Improving Text-to-SQL Evaluation Methodology](http://arxiv.org/abs/1806.09029) | [Data and System](https://github.com/jkkummerfeld/text2sql-data) | +| Iyer et al., (2017) | 41 | 1 | [Learning a neural semantic parser from user feedback](http://www.aclweb.org/anthology/P17-1089) | [System](https://github.com/sriniiyer/nl2sql) | + ### GeoQuery 877 user questions about US geography: @@ -111,22 +142,14 @@ Example: | Template Baseline (Finegan-Dollak et al., 2018) | 52 | 0 | [Improving Text-to-SQL Evaluation Methodology](http://arxiv.org/abs/1806.09029) | [Data and System](https://github.com/jkkummerfeld/text2sql-data) | | Iyer et al., (2017) | 44 | 3 | [Learning a neural semantic parser from user feedback](http://www.aclweb.org/anthology/P17-1089) | [System](https://github.com/sriniiyer/nl2sql) | -### Advising - -4,570 user questions about university course advising, with manually annotated SQL [Finegan-Dollak et al., (2018)](http://arxiv.org/abs/1806.09029). - -Example: - -| Question | SQL query | -| ------------- | --- | -| Can undergrads take 550 ? | `SELECT DISTINCT COURSEalias0.ADVISORY_REQUIREMENT , COURSEalias0.ENFORCED_REQUIREMENT , COURSEalias0.NAME FROM COURSE AS COURSEalias0 WHERE COURSEalias0.DEPARTMENT = \"department0\" AND COURSEalias0.NUMBER = 550 ;` | +### Spider -| Model | Question Split | Query Split | Paper / Source | Code | -| --------------- | ----- | :-----:| --------------- | ---- | -| Template Baseline (Finegan-Dollak et al., 2018) | 80 | 0 | [Improving Text-to-SQL Evaluation Methodology](http://arxiv.org/abs/1806.09029) | [Data and System](https://github.com/jkkummerfeld/text2sql-data) | -| Seq2Seq with copying (Finegan-Dollak et al., 2018) | 70 | 0 | [Improving Text-to-SQL Evaluation Methodology](http://arxiv.org/abs/1806.09029) | [Data and System](https://github.com/jkkummerfeld/text2sql-data) | -| Iyer et al., (2017) | 41 | 1 | [Learning a neural semantic parser from user feedback](http://www.aclweb.org/anthology/P17-1089) | [System](https://github.com/sriniiyer/nl2sql) | +Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL +dataset. It consists of 10,181 questions and 5,693 unique complex SQL queries on +200 databases with multiple tables covering 138 different domains. In Spider 1.0, +different complex SQL queries and databases appear in train and test sets. +The Spider dataset can be accessed and leaderboard can be accessed [here](https://yale-lily.github.io/spider). ### WikiSQL @@ -147,16 +170,6 @@ Example: | SQLNet (Xu et al., 2017) | 68.0 | [Sqlnet: Generating structured queries from natural language without reinforcement learning](https://arxiv.org/abs/1711.04436) | | Seq2SQL (Zhong et al., 2017) | 59.4 | [Seq2sql: Generating structured queries from natural language using reinforcement learning](https://arxiv.org/abs/1709.00103) | - -### Spider - -Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL -dataset. It consists of 10,181 questions and 5,693 unique complex SQL queries on -200 databases with multiple tables covering 138 different domains. In Spider 1.0, -different complex SQL queries and databases appear in train and test sets. - -The Spider dataset can be accessed and leaderboard can be accessed [here](https://yale-lily.github.io/spider). - ### Smaller Datasets Restaurants - 378 questions about restaurants, their cuisine and locations, collected by [Tang and Mooney (2000)](http://www.aclweb.org/anthology/W/W00/W00-1317.pdf), converted to SQL by [Popescu et al., (2003)]((http://doi.acm.org/10.1145/604045.604070) and [Giordani and Moschitti (2012)](https://doi.org/10.1007/978-3-642-45260-4_5), improved and converted to canonical style by [Finegan-Dollak et al., (2018)](http://arxiv.org/abs/1806.09029) diff --git a/english/stance_detection.md b/english/stance_detection.md index bd6a3aea..8f61bac9 100644 --- a/english/stance_detection.md +++ b/english/stance_detection.md @@ -18,6 +18,5 @@ This dataset subsumes the large [PHEME collection of rumors and stance](http://j | ------------- | ----- | --- | | Kochkina et al. 2017 | 0.784 | [Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM](http://www.aclweb.org/anthology/S/S17/S17-2083.pdf)| | Bahuleyan and Vechtomova 2017| 0.780 | [UWaterloo at SemEval-2017 Task 8: Detecting Stance towards Rumours with Topic Independent Features](http://www.aclweb.org/anthology/S/S17/S17-2080.pdf) | -| [Go back to the README](../README.md) diff --git a/english/word_sense_disambiguation.md b/english/word_sense_disambiguation.md index 3a5a308c..81769ca3 100644 --- a/english/word_sense_disambiguation.md +++ b/english/word_sense_disambiguation.md @@ -23,36 +23,33 @@ The main evaluation measure is F1-score. ### Supervised: -| Model | Senseval 2 |Senseval 3 |SemEval 2007 |SemEval 2013 |SemEval 2015 | Paper / Source | +| Model | Senseval 2 |Senseval 3 |SemEval 2007 |SemEval 2013 |SemEval 2015 | Paper / Source | | ------------- | :-----:|:-----:|:-----:|:-----:|:-----:| --- | -|MFS baseline | 65.6 | 66.0 | 54.5 | 63.8 | 67.1 | [1] | -|Bi-LSTMatt+LEX | 72.0 | 69.4 |63.7* | 66.4 | 72.4 | [2] | -|Bi-LSTMatt+LEX+POS | 72.0 | 69.1|64.8* | 66.9 | 71.5 | [2] | -|context2vec | 71.8 | 69.1 |61.3 | 65.6 | 71.9 | [3] | -|ELMo | 71.6 | 69.6 | 62.2 | 66.2 | 71.3 | [4] | -|GAS (Linear) | 72.0 | 70.0 | --* | 66.7 | 71.6 | [5] | -|GAS (Concatenation) | 72.1 | 70.2 | --* | 67 | 71.8 | [5] | -|GASext (Linear) | 72.4 | 70.1 | --* | 67.1 | 72.1 |[5] | -|GASext (Concatenation) | 72.2 | 70.5 | --* | 67.2 | 72.6 | [5] | -|supWSD | 71.3 | 68.8 | 60.2 | 65.8 | 70.0 | [6] [11] | -|supWSDemb | 72.7 | 70.6 | 63.1 | 66.8 | 71.8 | [7] [11] | +|MFS baseline | 65.6 | 66.0 | 54.5 | 63.8 | 67.1 | [[1]](http://aclweb.org/anthology/E/E17/E17-1010.pdf) | +|Bi-LSTMatt+LEX | 72.0 | 69.4 |63.7* | 66.4 | 72.4 | [[2]]((http://aclweb.org/anthology/D17-1120)) | +|Bi-LSTMatt+LEX+POS | 72.0 | 69.1|64.8* | 66.9 | 71.5 | [[2]](http://aclweb.org/anthology/D17-1120) | +|context2vec | 71.8 | 69.1 |61.3 | 65.6 | 71.9 | [[3]]((http://www.aclweb.org/anthology/K16-1006)) | +|ELMo | 71.6 | 69.6 | 62.2 | 66.2 | 71.3 | [[4]]((http://aclweb.org/anthology/N18-1202)) | +|GAS (Linear) | 72.0 | 70.0 | --* | 66.7 | 71.6 | [[5]](http://aclweb.org/anthology/P18-1230) | +|GAS (Concatenation) | 72.1 | 70.2 | --* | 67 | 71.8 | [[5]](http://aclweb.org/anthology/P18-1230)) | +|GASext (Linear) | 72.4 | 70.1 | --* | 67.1 | 72.1 |[[5]](http://aclweb.org/anthology/P18-1230) | +|GASext (Concatenation) | 72.2 | 70.5 | --* | 67.2 | 72.6 | [[5]](http://aclweb.org/anthology/P18-1230) | +|supWSD | 71.3 | 68.8 | 60.2 | 65.8 | 70.0 | [[6]](https://aclanthology.info/pdf/P/P10/P10-4014.pdf) [[11]](http://aclweb.org/anthology/D17-2018) | +|supWSDemb | 72.7 | 70.6 | 63.1 | 66.8 | 71.8 | [[7]](http://www.aclweb.org/anthology/P16-1085) [[11]](http://aclweb.org/anthology/D17-2018) | ### Knowledge-based: | Model | Senseval 2 |Senseval 3 |SemEval 2007 |SemEval 2013 |SemEval 2015 | Paper / Source | | ------------- | :-----:|:-----:|:-----:|:-----:|:-----:| --- | -|WN 1st sense baseline | 66.8 | 66.2 | 55.2 | 63.0 | 67.8 | [1] | -|Babelfy| 67.0 | 63.5 | 51.6 | 66.4 | 70.3 | [8] | -|UKBppr_w2w-nf | 64.2 | 54.8 | 40.0 | 64.5 | 64.5 | [9] [12] | -|UKBppr_w2w| 68.8 | 66.1 | 53.0 | 68.8 | 70.3 | [9] [12] | -|WSD-TM | 69.0 | 66.9 | 55.6 | 65.3 | 69.6 | [10] | +|WN 1st sense baseline | 66.8 | 66.2 | 55.2 | 63.0 | 67.8 | [[1]](http://aclweb.org/anthology/E/E17/E17-1010.pdf) | +|Babelfy| 67.0 | 63.5 | 51.6 | 66.4 | 70.3 | [[8]](http://aclweb.org/anthology/Q14-1019) | +|UKBppr_w2w-nf | 64.2 | 54.8 | 40.0 | 64.5 | 64.5 | [[9]](https://www.mitpressjournals.org/doi/full/10.1162/COLI_a_00164) [[12]](http://aclweb.org/anthology/W18-2505) | +|UKBppr_w2w| 68.8 | 66.1 | 53.0 | 68.8 | 70.3 | [[9]](https://www.mitpressjournals.org/doi/full/10.1162/COLI_a_00164) [[12]](http://aclweb.org/anthology/W18-2505) | +|WSD-TM | 69.0 | 66.9 | 55.6 | 65.3 | 69.6 | [[10]](https://arxiv.org/pdf/1801.01900.pdf) | Note: The scores of [6,7] and [9] are not taken from the original papers but from the results of the implementations of [11] and [12], respectively. - - - [1] [Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison](http://aclweb.org/anthology/E/E17/E17-1010.pdf) [2] [Neural Sequence Learning Models for Word Sense Disambiguation](http://aclweb.org/anthology/D17-1120) diff --git a/hindi/hindi.md b/hindi/hindi.md index 7c1171bb..bcc2fe90 100644 --- a/hindi/hindi.md +++ b/hindi/hindi.md @@ -1,9 +1,21 @@ # Hindi -## Chunking and Part of Speech Tagging +## Chunking -{% include table.html results=site.data.hindi_basic scores='Accuracy' %} +| Model | Dev accuracy | Test F1 | Paper / Source | Code | +| ------------- | :-----:| :-----:| --- | --- | +| Dalal et al. (2006) | 87.40 | 82.40 | [Hindi Part-of-Speech Tagging and Chunking: A Maximum Entropy Approach](https://www.researchgate.net/publication/241211496_Hindi_Part-of-Speech_Tagging_and_Chunking_A_Maximum_Entropy_Approach) | | + +## Part-of-speech tagging + +| Model | Dev accuracy | Test F1 | Paper / Source | Code | +| ------------- | :-----:| :-----:| --- | --- | +| Dalal et al. (2006) | 89.35 | 82.22 | [Hindi Part-of-Speech Tagging and Chunking: A Maximum Entropy Approach](https://www.researchgate.net/publication/241211496_Hindi_Part-of-Speech_Tagging_and_Chunking_A_Maximum_Entropy_Approach) | | ## Machine Translation -{% include table.html results=site.data.hindi_machine_translation scores='BLEU' %} \ No newline at end of file +The IIT Bombay English-Hindi Parallel Corpus used by Kunchukuttan et al. (2018) can be accessed [here](http://www.cfilt.iitb.ac.in/iitb_parallel/). + +| Model | BLEU | METEOR | Paper / Source | Code | +| ------------- | :-----:| :-----:| --- | --- | +| Kunchukuttan et al. (2018) | 89.35 | 0.308 | [The IIT Bombay English-Hindi Parallel Corpus](http://www.lrec-conf.org/proceedings/lrec2018/pdf/847.pdf) | | diff --git a/img/edit_file.png b/img/edit_file.png new file mode 100644 index 00000000..16fc3084 Binary files /dev/null and b/img/edit_file.png differ diff --git a/img/propose_file_change.png b/img/propose_file_change.png new file mode 100644 index 00000000..30b4ecea Binary files /dev/null and b/img/propose_file_change.png differ diff --git a/jekyll_instructions.md b/jekyll_instructions.md new file mode 100644 index 00000000..e7c6bb29 --- /dev/null +++ b/jekyll_instructions.md @@ -0,0 +1,15 @@ +# Instructions for building the site locally + +You can build the site locally using Jekyll by following the steps detailed +[here](https://help.github.com/articles/setting-up-your-github-pages-site-locally-with-jekyll/#requirements): + +1. Check whether you have Ruby 2.1.0 or higher installed with `ruby --version`, otherwise [install it](https://www.ruby-lang.org/en/downloads/). +On OS X for instance, this can be done with `brew install ruby`. Make sure you also have `ruby-dev` and `zlib1g-dev` installed. +1. Install Bundler `gem install bundler`. If you run into issues with installing bundler on OS X, have a look +[here](https://bundler.io/v1.16/guides/rubygems_tls_ssl_troubleshooting_guide.html) for troubleshooting tips. Also try refreshing +the terminal. +1. Clone the repo locally: `git clone https://github.com/sebastianruder/NLP-progress` +1. Navigate to the repo with `cd NLP-progress` +1. Install Jekyll: `bundle install` +1. Run the Jekyll site locally: `bundle exec jekyll serve` +1. You can now preview the local Jekyll site in your browser at `http://localhost:4000`. diff --git a/korean/korean.md b/korean/korean.md deleted file mode 100644 index 9a1d7aa3..00000000 --- a/korean/korean.md +++ /dev/null @@ -1,3 +0,0 @@ -# Korean - -{% include table.html results=site.data.korean %} diff --git a/spanish/entity_linking.md b/spanish/entity_linking.md new file mode 100644 index 00000000..a61ebfd2 --- /dev/null +++ b/spanish/entity_linking.md @@ -0,0 +1,16 @@ +# Entity Linking + +See [here](../english/entity_linking.md) for more information about the task. + +### Datasets + +#### AIDA CoNLL-YAGO Dataset + +##### Disambiguation-Only Models + +| Model | Micro-Precision | Paper / Source | Code | +| ------------- | :-----:| :----: | :----: | --- | +| Sil et al. (2018) | 82.3 | [Neural Cross-Lingual Entity Linking](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16501/16101) | | +| Tsai & Roth (2016) | 80.9 | [Cross-lingual wikification using multilingual embeddings](http://cogcomp.org/papers/TsaiRo16b.pdf) | | + +[Go back to the README](../README.md) diff --git a/vietnamese/vietnamese.md b/vietnamese/vietnamese.md index 3a99df15..ac3d174b 100644 --- a/vietnamese/vietnamese.md +++ b/vietnamese/vietnamese.md @@ -1,36 +1,36 @@ # Vietnamese NLP tasks -## Word segmentation +## Dependency parsing -* Training data: 75k manually word-segmented training sentences from the [VLSP](http://vlsp.org.vn/) 2013 word segmentation shared task. -* Test data: 2120 test sentences from the VLSP 2013 POS tagging shared task. +* The last 1020 sentences of the [benchmark Vietnamese dependency treebank VnDT](http://vndp.sourceforge.net) are used for test, while the remaining 9k+ sentences are used for training & development. LAS and UAS scores are computed on all +tokens (i.e. including punctuation). -| Model | F1 | Paper | Code | -| ------------- | :-----:| --- | --- | -| VnCoreNLP-RDRsegmenter (2018) | 97.90 | [A Fast and Accurate Vietnamese Word Segmenter](http://www.lrec-conf.org/proceedings/lrec2018/pdf/55.pdf) | [Official](https://github.com/datquocnguyen/RDRsegmenter) | -| UETsegmenter (2016) | 97.87 | [A hybrid approach to Vietnamese word segmentation](http://doi.org/10.1109/RIVF.2016.7800279) | [Official](https://github.com/phongnt570/UETsegmenter) | -| vnTokenizer (2008) | 97.33 | [A Hybrid Approach to Word Segmentation of Vietnamese Texts](https://link.springer.com/chapter/10.1007/978-3-540-88282-4_23) | | -| JVnSegmenter (2006) | 97.06 | [Vietnamese Word Segmentation with CRFs and SVMs: An Investigation](http://www.aclweb.org/anthology/Y06-1028) | | -| DongDu (2012) | 96.90 | Ứng dụng phương pháp Pointwise vào bài toán tách từ cho tiếng Việt | | -* Results for VnTokenizer, JVnSegmenter and DongDu are reported in "[A hybrid approach to Vietnamese word segmentation](http://doi.org/10.1109/RIVF.2016.7800279)." +| | Model | LAS | UAS | Paper | Code | +| ----- | ------------- | :-----:| --- | --- | --- | +| **Predicted POS** | VnCoreNLP (2018) | 70.23 | 76.93 | [VnCoreNLP: A Vietnamese Natural Language Processing Toolkit](http://aclweb.org/anthology/N18-5012) | [Official](https://github.com/vncorenlp/VnCoreNLP) | +| Gold POS | VnCoreNLP (2018) |73.39 |79.02 | [VnCoreNLP: A Vietnamese Natural Language Processing Toolkit](http://aclweb.org/anthology/N18-5012) | [Official](https://github.com/vncorenlp/VnCoreNLP) | +| Gold POS | BiLSTM graph-based parser (2016) | 73.17|79.39 | [Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations](https://aclweb.org/anthology/Q16-1023) | [Official](https://github.com/elikip/bist-parser/tree/master/bmstparser/src) | +| Gold POS | BiLSTM transition-based parser (2016) | 72.53| 79.33 | [Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations](https://aclweb.org/anthology/Q16-1023) | [Official](https://github.com/elikip/bist-parser/tree/master/barchybrid/src) | +| Gold POS | MSTparser (2006) | 70.29 | 76.47 | [Online large-margin training of dependency parsers](http://www.aclweb.org/anthology/P05-1012) | | +| Gold POS | MaltParser (2007) | 69.10 | 74.91 | [MaltParser: A language-independent system for datadriven dependency parsing](https://stp.lingfil.uu.se/~nivre/docs/nle07.pdf) | | + +* Predicted POS tags are generated by using VnCoreNLP-VnMarMoT. Results for the BiLSTM graph/transition-based parsers, MSTparser and MaltParser are reported in "[An empirical study for Vietnamese dependency parsing](http://www.aclweb.org/anthology/U16-1017)." -## POS tagging +## Machine translation -* 27,870 sentences for training and development from the VLSP 2013 POS tagging shared task: - * 27k sentences are used for training. - * 870 sentences are used for development. -* Test data: 2120 test sentences from the VLSP 2013 POS tagging shared task. +### English-to-Vietnamese translation +* Dataset is from [The IWSLT 2015 Evaluation Campaign](http://workshop2015.iwslt.org/downloads/proceeding.pdf), also be obtained from [https://github.com/tensorflow/nmt](https://github.com/tensorflow/nmt): `tst2012` is used for development while `tst2013` is used for test. Scores are computed for single models. -| Model | Accuracy | Paper | Code | +| Model | BLEU | Paper | Code | | ------------- | :-----:| --- | --- | -| VnCoreNLP-VnMarMoT (2017) | 95.88 | [From Word Segmentation to POS Tagging for Vietnamese](http://aclweb.org/anthology/U17-1013) | [Official](https://github.com/datquocnguyen/vnmarmot) | -| BiLSTM-CRF + CNN-char (2016) | 95.40 | [End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF](http://www.aclweb.org/anthology/P16-1101) | [Official](https://github.com/XuezheMax/LasagneNLP) / [Link](https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/) | -| BiLSTM-CRF + LSTM-char (2016) | 95.31 | [Neural Architectures for Named Entity Recognition](http://www.aclweb.org/anthology/N16-1030) | [Link](https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/) | -| BiLSTM-CRF (2015) | 95.06 | [Bidirectional LSTM-CRF Models for Sequence Tagging](https://arxiv.org/abs/1508.01991) | [Link](https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/) | -| RDRPOSTagger (2014) | 95.11 | [RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger](http://www.aclweb.org/anthology/E14-2005) | [Official](https://github.com/datquocnguyen/rdrpostagger) | +| CVT (2018) | 29.6 | [Semi-Supervised Sequence Modeling with Cross-View Training](https://arxiv.org/abs/1809.08370) | | +| ELMo (2018) | 29.3 | [Deep contextualized word representations](http://aclweb.org/anthology/N18-1202)| | +| Transformer (2017) | 28.9 | [Attention is all you need](http://papers.nips.cc/paper/7181-attention-is-all-you-need) | [Link](https://github.com/duyvuleo/Transformer-DyNet) | +| Google (2017) | 26.1 | [Neural machine translation (seq2seq) tutorial](https://github.com/tensorflow/nmt) | [Official](https://github.com/tensorflow/nmt) | +| Stanford (2015) |23.3 | [Stanford Neural Machine Translation Systems for Spoken Language Domains](https://nlp.stanford.edu/pubs/luong-manning-iwslt15.pdf) | | -* Results for BiLSTM-CRF-based models and RDRPOSTagger are reported in "[From Word Segmentation to POS Tagging for Vietnamese](http://aclweb.org/anthology/U17-1013)." +* The ELMo score is reported in [Semi-Supervised Sequence Modeling with Cross-View Training](https://arxiv.org/abs/1809.08370). The Transformer score is available at [https://github.com/duyvuleo/Transformer-DyNet](https://github.com/duyvuleo/Transformer-DyNet). ## Named entity recognition * 16,861 sentences for training and development from the VLSP 2016 NER shared task: @@ -51,35 +51,34 @@ * BiLSTM-CRF-based scores are reported in "[VnCoreNLP: A Vietnamese Natural Language Processing Toolkit](http://aclweb.org/anthology/N18-5012)." -## Dependency parsing - -* The last 1020 sentences of the [benchmark Vietnamese dependency treebank VnDT](http://vndp.sourceforge.net) are used for test, while the remaining 9k+ sentences are used for training & development. LAS and UAS scores are computed on all -tokens (i.e. including punctuation). +## Part-of-speech tagging +* 27,870 sentences for training and development from the VLSP 2013 POS tagging shared task: + * 27k sentences are used for training. + * 870 sentences are used for development. +* Test data: 2120 test sentences from the VLSP 2013 POS tagging shared task. -| | Model | LAS | UAS | Paper | Code | -| ----- | ------------- | :-----:| --- | --- | --- | -| **Predicted POS** | VnCoreNLP (2018) | 70.23 | 76.93 | [VnCoreNLP: A Vietnamese Natural Language Processing Toolkit](http://aclweb.org/anthology/N18-5012) | [Official](https://github.com/vncorenlp/VnCoreNLP) | -| Gold POS | VnCoreNLP (2018) |73.39 |79.02 | [VnCoreNLP: A Vietnamese Natural Language Processing Toolkit](http://aclweb.org/anthology/N18-5012) | [Official](https://github.com/vncorenlp/VnCoreNLP) | -| Gold POS | BiLSTM graph-based parser (2016) | 73.17|79.39 | [Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations](https://aclweb.org/anthology/Q16-1023) | [Official](https://github.com/elikip/bist-parser/tree/master/bmstparser/src) | -| Gold POS | BiLSTM transition-based parser (2016) | 72.53| 79.33 | [Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations](https://aclweb.org/anthology/Q16-1023) | [Official](https://github.com/elikip/bist-parser/tree/master/barchybrid/src) | -| Gold POS | MSTparser (2006) | 70.29 | 76.47 | [Online large-margin training of dependency parsers](http://www.aclweb.org/anthology/P05-1012) | | -| Gold POS | MaltParser (2007) | 69.10 | 74.91 | [MaltParser: A language-independent system for datadriven dependency parsing](https://stp.lingfil.uu.se/~nivre/docs/nle07.pdf) | | +| Model | Accuracy | Paper | Code | +| ------------- | :-----:| --- | --- | +| VnCoreNLP-VnMarMoT (2017) | 95.88 | [From Word Segmentation to POS Tagging for Vietnamese](http://aclweb.org/anthology/U17-1013) | [Official](https://github.com/datquocnguyen/vnmarmot) | +| BiLSTM-CRF + CNN-char (2016) | 95.40 | [End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF](http://www.aclweb.org/anthology/P16-1101) | [Official](https://github.com/XuezheMax/LasagneNLP) / [Link](https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/) | +| BiLSTM-CRF + LSTM-char (2016) | 95.31 | [Neural Architectures for Named Entity Recognition](http://www.aclweb.org/anthology/N16-1030) | [Link](https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/) | +| BiLSTM-CRF (2015) | 95.06 | [Bidirectional LSTM-CRF Models for Sequence Tagging](https://arxiv.org/abs/1508.01991) | [Link](https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/) | +| RDRPOSTagger (2014) | 95.11 | [RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger](http://www.aclweb.org/anthology/E14-2005) | [Official](https://github.com/datquocnguyen/rdrpostagger) | -* Predicted POS tags are generated by using VnCoreNLP-VnMarMoT. Results for the BiLSTM graph/transition-based parsers, MSTparser and MaltParser are reported in "[An empirical study for Vietnamese dependency parsing](http://www.aclweb.org/anthology/U16-1017)." +* Results for BiLSTM-CRF-based models and RDRPOSTagger are reported in "[From Word Segmentation to POS Tagging for Vietnamese](http://aclweb.org/anthology/U17-1013)." -## Machine translation +## Word segmentation -### English-to-Vietnamese translation -* Dataset is from [The IWSLT 2015 Evaluation Campaign](http://workshop2015.iwslt.org/downloads/proceeding.pdf), also be obtained from [https://github.com/tensorflow/nmt](https://github.com/tensorflow/nmt): `tst2012` is used for development while `tst2013` is used for test. Scores are computed for single models. +* Training data: 75k manually word-segmented training sentences from the [VLSP](http://vlsp.org.vn/) 2013 word segmentation shared task. +* Test data: 2120 test sentences from the VLSP 2013 POS tagging shared task. -| Model | BLEU | Paper | Code | +| Model | F1 | Paper | Code | | ------------- | :-----:| --- | --- | -| CVT (2018) | 29.6 | [Semi-Supervised Sequence Modeling with Cross-View Training](https://arxiv.org/abs/1809.08370) | | -| ELMo (2018) | 29.3 | [Deep contextualized word representations](http://aclweb.org/anthology/N18-1202)| | -| Transformer (2017) | 28.9 | [Attention is all you need](http://papers.nips.cc/paper/7181-attention-is-all-you-need) | [Link](https://github.com/duyvuleo/Transformer-DyNet) | -| Google (2017) | 26.1 | [Neural machine translation (seq2seq) tutorial](https://github.com/tensorflow/nmt) | [Official](https://github.com/tensorflow/nmt) | -| Stanford (2015) |23.3 | [Stanford Neural Machine Translation Systems for Spoken Language Domains](https://nlp.stanford.edu/pubs/luong-manning-iwslt15.pdf) | | - -* The ELMo score is reported in [Semi-Supervised Sequence Modeling with Cross-View Training](https://arxiv.org/abs/1809.08370). The Transformer score is available at [https://github.com/duyvuleo/Transformer-DyNet](https://github.com/duyvuleo/Transformer-DyNet). +| VnCoreNLP-RDRsegmenter (2018) | 97.90 | [A Fast and Accurate Vietnamese Word Segmenter](http://www.lrec-conf.org/proceedings/lrec2018/pdf/55.pdf) | [Official](https://github.com/datquocnguyen/RDRsegmenter) | +| UETsegmenter (2016) | 97.87 | [A hybrid approach to Vietnamese word segmentation](http://doi.org/10.1109/RIVF.2016.7800279) | [Official](https://github.com/phongnt570/UETsegmenter) | +| vnTokenizer (2008) | 97.33 | [A Hybrid Approach to Word Segmentation of Vietnamese Texts](https://link.springer.com/chapter/10.1007/978-3-540-88282-4_23) | | +| JVnSegmenter (2006) | 97.06 | [Vietnamese Word Segmentation with CRFs and SVMs: An Investigation](http://www.aclweb.org/anthology/Y06-1028) | | +| DongDu (2012) | 96.90 | Ứng dụng phương pháp Pointwise vào bài toán tách từ cho tiếng Việt | | +* Results for VnTokenizer, JVnSegmenter and DongDu are reported in "[A hybrid approach to Vietnamese word segmentation](http://doi.org/10.1109/RIVF.2016.7800279)."