INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰，旨在安全快捷的帮助用户拿回自己的数据，工具代码开源，流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客…

Python 7,912 1,493 Updated Aug 20, 2024

Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,763 770 Updated Feb 11, 2024

HqWu-HITCS / Awesome-Chinese-LLM

整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。

17,893 1,717 Updated Sep 19, 2024

nuochenpku / Harry-Potter-Dialogue-Dataset

[EMNLP 2023]This the repository of Harry Potter Dialogue Dataset.

123 4 Updated Oct 19, 2024

candlewill / Dialog_Corpus

用于训练中英文对话系统的语料库 Datasets for Training Chatbot System

Python 2,038 496 Updated Sep 23, 2020

thu-coai / CrossWOZ

A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

Python 664 114 Updated Jun 17, 2024

qiuhuachuan / smile

[EMNLP 2024] 中文领域心理健康对话大模型MeChat

Python 402 44 Updated Nov 17, 2024

phasetr / math-textbook

Math textbook. Trial for social textbook writing.

TeX 21 4 Updated Feb 8, 2015

ProFatXuanAll / terence-tao-analysis

My notes on Analysis I and Analysis II, 3rd edition, written by Terence Tao.

TeX 48 15 Updated Jul 26, 2023

latexstudio / MathNoteBook

A growing collection of my middle school & high school Math typeset using the latex The Legrand Orange Book template

TeX 5 7 Updated Oct 21, 2015

jamesfang8499 / math1

TeX 281 75 Updated Apr 18, 2024

josephuses / UPLIFT

LaTeX book project for the Philippine Science High School-UPLIFT Project

TeX 5 1 Updated Sep 11, 2017

langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications

Jupyter Notebook 99,079 16,114 Updated Jan 28, 2025

nickrosh / evol-teacher

Open Source WizardCoder Dataset

Python 155 12 Updated Jul 12, 2023

project-baize / baize-chatbot

Let ChatGPT teach your own chatbot in hours with a single GPU!

Python 3,167 287 Updated Mar 17, 2024

WadeYin9712 / Dynosaur

Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)

Python 63 6 Updated Nov 30, 2023

Zjh-819 / LLMDataHub

A quick guide (especially) for trending instruction finetuning datasets

2,798 180 Updated Nov 28, 2023

silverriver / MMChat

[LREC] MMChat: Multi-Modal Chat Dataset on Social Media

Python 99 6 Updated Sep 25, 2022

lemon234071 / clean-dialog

A framework for cleaning Chinese dialog data

Python 265 28 Updated May 14, 2021

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 38,861 4,769 Updated Jan 28, 2025

arielnlee / Platypus

Code for fine-tuning Platypus fam LLMs using LoRA

Python 626 60 Updated Feb 4, 2024

ExpressAI / AI-Gaokao

Gaokao Benchmark for AI

105 6 Updated Jul 8, 2022

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,803 223 Updated Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sheng-han-zhang

Block or report sheng-han-zhang

Stars

DACUS1995 / pytorch-mmap-dataset

malaysia-ai / pretrain-text-dataset

malaysia-ai / clean-text-my

ChenghaoMou / text-dedup

SciPhi-AI / synthesizer

google-deepmind / mathematics_dataset

lss233 / chatgpt-mirai-qq-bot

kangvcar / InfoSpider