中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Unsupervised text tokenizer for Neural Network-based text generation.
A tool for extracting plain text from Wikipedia dumps
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
MASS: Masked Sequence to Sequence Pre-training for Language Generation
PyTorch original implementation of Cross-lingual Language Model Pretraining.
Phrase-Based & Neural Unsupervised Machine Translation
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
General purpose unsupervised sentence representations
Language-Agnostic SEntence Representations
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
Library for fast text representation and classification.
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
TensorFlow code and pre-trained models for BERT
bert nlp papers, applications and github resources, including the newst xlnet , BERT、XLNet 相关论文和 github 项目
A framework to learn cross-lingual word embedding mappings
A machine translation reading list maintained by Tsinghua Natural Language Processing Group
HIT-SCIR / ELMoForManyLangs
Forked from bozheng-hit/ELMoPre-trained ELMo Representations for Many Languages
An open-source NLP research library, built on PyTorch.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
A python tool for evaluating the quality of sentence embeddings.