Stars
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
TensorFlow code and pre-trained models for BERT
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
PyTorch implementations of Generative Adversarial Networks.
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
An Open-Source Framework for Prompt-Learning.
Seamlessly integrate LLMs into scikit-learn.
Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集
pytorch tutorial for beginners
HanLP作者的新书《自然语言处理入门》详细笔记!业界良心之作,书中不是枯燥无味的公式罗列,而是用白话阐述的通俗易懂的算法模型。从基本概念出发,逐步介绍中文分词、词性标注、命名实体识别、信息抽取、文本聚类、文本分类、句法分析这几个热门问题的算法原理与工程实现。
A python tool for evaluating the quality of sentence embeddings.
EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit
一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda
An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。
mixup: Beyond Empirical Risk Minimization
利用HuggingFace的官方下载工具从镜像网站进行高速下载。
近年来事件抽取方法总结,包括中文事件抽取、开放域事件抽取、事件数据生成、跨语言事件抽取、小样本事件抽取、零样本事件抽取等类型,DMCNN、FramNet、DLRNN、DBRNN、GCN、DAG-GRU、JMEE、PLMEE等方法
Toolbox to integrate optimal transport loss functions using automatic differentiation and Sinkhorn's algorithm
3000000+语义理解与匹配数据集。可用于无监督对比学习、半监督学习等构建中文领域效果最好的预训练模型