Skip to content
View sheng-han-zhang's full-sized avatar

Block or report sheng-han-zhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A custom pytorch Dataset extension that provides a faster iteration and better RAM usage

Python 42 7 Updated Mar 14, 2024

Prepare pretrain dataset for Malaysian context.

Jupyter Notebook 12 3 Updated Sep 1, 2024

Clean Text Malaysian

Python 6 1 Updated Aug 26, 2023

All-in-one text de-duplication

Python 651 72 Updated May 21, 2024

A multi-purpose LLM framework for RAG and data creation.

Python 618 52 Updated Jan 13, 2024

This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty.

Python 1,832 252 Updated Dec 23, 2024

🚀 一键部署!真正的 AI 聊天机器人!支持ChatGPT、DeepSeek、Claude、Gemini、ChatGLM、文心一言、讯飞星火,多账号,人设调教,虚拟女仆、图片渲染、语音发送 | 支持 QQ、Telegram、Discord、微信 等平台

Python 13,673 1,586 Updated Jan 5, 2025

INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客…

Python 7,912 1,493 Updated Aug 20, 2024

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,763 770 Updated Feb 11, 2024

整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

17,893 1,717 Updated Sep 19, 2024

[EMNLP 2023]This the repository of Harry Potter Dialogue Dataset.

123 4 Updated Oct 19, 2024

用于训练中英文对话系统的语料库 Datasets for Training Chatbot System

Python 2,038 496 Updated Sep 23, 2020

A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset

Python 664 114 Updated Jun 17, 2024

[EMNLP 2024] 中文领域心理健康对话大模型MeChat

Python 402 44 Updated Nov 17, 2024

Math textbook. Trial for social textbook writing.

TeX 21 4 Updated Feb 8, 2015

My notes on Analysis I and Analysis II, 3rd edition, written by Terence Tao.

TeX 48 15 Updated Jul 26, 2023

A growing collection of my middle school & high school Math typeset using the latex The Legrand Orange Book template

TeX 5 7 Updated Oct 21, 2015
TeX 281 75 Updated Apr 18, 2024

LaTeX book project for the Philippine Science High School-UPLIFT Project

TeX 5 1 Updated Sep 11, 2017

🦜🔗 Build context-aware reasoning applications

Jupyter Notebook 99,079 16,114 Updated Jan 28, 2025

Open Source WizardCoder Dataset

Python 155 12 Updated Jul 12, 2023

Let ChatGPT teach your own chatbot in hours with a single GPU!

Python 3,167 287 Updated Mar 17, 2024

Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)

Python 63 6 Updated Nov 30, 2023

A quick guide (especially) for trending instruction finetuning datasets

2,798 180 Updated Nov 28, 2023

[LREC] MMChat: Multi-Modal Chat Dataset on Social Media

Python 99 6 Updated Sep 25, 2022

A framework for cleaning Chinese dialog data

Python 265 28 Updated May 14, 2021

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 38,861 4,769 Updated Jan 28, 2025

Code for fine-tuning Platypus fam LLMs using LoRA

Python 626 60 Updated Feb 4, 2024

Gaokao Benchmark for AI

105 6 Updated Jul 8, 2022

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,803 223 Updated Jan 26, 2025
Next