Named entity recognition (NER) is the task of tagging entities in text with their corresponding type. Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities. O is used for non-entity tokens.
Example:
Mark | Watney | visited | Mars |
---|---|---|---|
B-PER | I-PER | O | B-LOC |
The CoNLL 2003 NER task consists of newswire text from the Reuters RCV1 corpus tagged with four different entity types (PER, LOC, ORG, MISC). Models are evaluated based on span-based F1.
Model | F1 | Paper / Source | Code |
---|---|---|---|
Flair embeddings (Akbik et al., 2018) | 93.09 | Contextual String Embeddings for Sequence Labeling | Flair framework |
BiLSTM-CRF+ELMo (Peters et al., 2018) | 92.22 | Deep contextualized word representations | AllenNLP Project AllenNLP GitHub |
Peters et al. (2017) | 91.93 | Semi-supervised sequence tagging with bidirectional language models | |
LM-LSTM-CRF (Liu et al., 2018) | 91.71 | Empowering Character-aware Sequence Labeling with Task-Aware Neural Language Model | LM-LSTM-CRF |
Yang et al. (2017) | 91.26 | Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks | |
Ma and Hovy (2016) | 91.21 | End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF | |
LSTM-CRF (Lample et al., 2016) | 90.94 | Neural Architectures for Named Entity Recognition |
The WNUT 2017 Emerging Entities task operates over a wide range of English text and focuses on generalisation beyond memorisation in high-variance environments. Scores are given both over entity chunk instances, and unique entity surface forms, to normalise the biasing impact of entities that occur frequently.
Feature | Train | Dev | Test |
---|---|---|---|
Posts | 3,395 | 1,009 | 1,287 |
Tokens | 62,729 | 15,733 | 23,394 |
NE tokens | 3,160 | 1,250 | 1,589 |
The data is annotated for six classes - person, location, group, creative work, product and corporation.
Links: WNUT 2017 Emerging Entity task page (including direct download links for data and scoring script)
Model | F1 | F1 (surface form) | Paper / Source |
---|---|---|---|
Aguilar et al. (2018) | 45.55 | Modeling Noisiness to Recognize Named Entities using Multitask Neural Networks on Social Media | |
SpinningBytes | 40.78 | 39.33 | Transfer Learning and Sentence Level Features for Named Entity Recognition on Tweets |