Stars
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
✅ Solutions to LeetCode by Go, 100% test coverage, runtime beats 100% / LeetCode 题解
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
A high-throughput and memory-efficient inference and serving engine for LLMs
Paper List of Pre-trained Foundation Recommender Models
Code examples and resources for DBRX, a large language model developed by Databricks
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
PyTorch tutorials, examples and some books I found 【不定期更新】整理的PyTorch 最新版教程、例子和书籍
Materials for the Learn PyTorch for Deep Learning: Zero to Mastery course.
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
An unofficial implementation of Poly-encoder (Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring)
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
Platform to experiment with the AI Software Engineer. Terminal based. NOTE: Very different from https://gptengineer.app
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Code and documentation to train Stanford's Alpaca models, and generate the data.