Stars
A custom pytorch Dataset extension that provides a faster iteration and better RAM usage
Prepare pretrain dataset for Malaysian context.
A multi-purpose LLM framework for RAG and data creation.
This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty.
🚀 一键部署!真正的 AI 聊天机器人!支持ChatGPT、DeepSeek、Claude、Gemini、ChatGLM、文心一言、讯飞星火,多账号,人设调教,虚拟女仆、图片渲染、语音发送 | 支持 QQ、Telegram、Discord、微信 等平台
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客…
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
[EMNLP 2023]This the repository of Harry Potter Dialogue Dataset.
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset
Math textbook. Trial for social textbook writing.
My notes on Analysis I and Analysis II, 3rd edition, written by Terence Tao.
A growing collection of my middle school & high school Math typeset using the latex The Legrand Orange Book template
LaTeX book project for the Philippine Science High School-UPLIFT Project
🦜🔗 Build context-aware reasoning applications
Let ChatGPT teach your own chatbot in hours with a single GPU!
Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)
A quick guide (especially) for trending instruction finetuning datasets
[LREC] MMChat: Multi-Modal Chat Dataset on Social Media
A framework for cleaning Chinese dialog data
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Code for fine-tuning Platypus fam LLMs using LoRA
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.