Stars
Data and software for building the ACL Anthology.
EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learni…
基于大模型搭建的聊天机器人,同时支持 微信公众号、企业微信应用、飞书、钉钉 等接入,可选择GPT3.5/GPT-4o/GPT-o1/ Claude/文心一言/讯飞星火/通义千问/ Gemini/GLM-4/Claude/Kimi/LinkAI,能处理文本、语音和图片,访问操作系统和互联网,支持基于自有知识库进行定制企业智能客服。
微信公众号文章批量下载工具,支持图片、评论下载,支持保存html/mhtml/md/pdf/docx文件
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
Ancient Chinese Corpus with Word Sense Annotation
This is a program that used for making metaphor recognition in several Chinese sentences. This could help grading the article in a way.
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
GuwenBERT: 古文预训练语言模型(古文BERT) A Pre-trained Language Model for Classical Chinese (Literary Chinese)
This is a 25,000 word UD treebank of Old English. The text has been retrieved from Martín Arista, Javier (ed.), et al. 2023. ParCorOEv3 [www.nerthusproject.com]. The treebank is a revised version o…
This is an SQL file of Oxford English Dictionary. It includes more than 41,OOO words! Just import the SQL.
Scrape article metadata from major media outlet's websites, including NYT, WaPo, WSJ. Built on top of the Newspaper Python Library (https://github.com/codelucas/newspaper).
The Washington Post Scraper is an application that allows the user to scrape articles from the Washington Post website and save a reference to them.
A news crawler for BBC News, Reuters and New York Times.
Data journalism research project: women through the lens of the New York Times from 1950 till present day.
This is a code example repo for the NLP course offered by the Institute of Chinese Information Processing of BNU.
中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention,DPCNN,Transformer,基于pytorch,开箱即用。
Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services
100+ Chinese Word Vectors 上百种预训练中文词向量
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
linux内核学习资料:200+经典内核文章,100+内核论文,50+内核项目,500+内核面试题,80+内核视频
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st NLP toolkit designed for Classical Chinese, supports lexicon construction, tokenizing, POS tagging, sentence segmentation a…
中文情感分析库(Chinese Sentiment))可对文本进行情绪分析、正负情感分析。Chinese sentiment analysis library, which supports counting the number of different emotional words in the text