Stars
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
TensorFlow code and pre-trained models for BERT
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
SoftVC VITS Singing Voice Conversion
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search…
An open-source NLP research library, built on PyTorch.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力
Google AI 2018 BERT pytorch implementation
Statistical learning methods, 统计学习方法(第2版)[李航] [笔记, 代码, notebook, 参考文献, Errata, lihang]
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
HanLP作者的新书《自然语言处理入门》详细笔记!业界良心之作,书中不是枯燥无味的公式罗列,而是用白话阐述的通俗易懂的算法模型。从基本概念出发,逐步介绍中文分词、词性标注、命名实体识别、信息抽取、文本聚类、文本分类、句法分析这几个热门问题的算法原理与工程实现。
收录NLP竞赛策略实现、各任务baseline、相关竞赛经验贴(当前赛事、往期赛事、训练赛)、NLP会议时间、常用自媒体、GPU推荐等,持续更新中
Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle
Exercises answers to the book "machine-learning" written by Zhou Zhihua。周志华《机器学习》课后习题,个人解答。各算法都拿numpy和pandas实现了一遍
Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com
Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle
MuCGEC中文纠错数据集及文本纠错SOTA模型开源;Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction"