A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
-
Updated
Nov 29, 2024 - Python
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Data repository for pretrained NLP models and NLP corpora.
微信公众号语料库
A collaborative catalog of NLP resources for Indic languages
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).
A web-based engine for creating and annotating textual corpora
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)
Unannotated Spanish 3 Billion Words Corpora
Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.
An R package for dynamic exploration of text collections
An advanced, extensible web front-end for the Manatee-open corpus search engine
The Official Repository for 👉 CCAE: A Corpus of Chinese-based Asian Englishes @ NLPCC 2023
Named Entity Recognition for biomedical entities
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
Reading the data from OPIEC - an Open Information Extraction corpus
Add a description, image, and links to the corpora topic page so that developers can more easily learn about it.
To associate your repository with the corpora topic, visit your repo's landing page and select "manage topics."