Skip to content
View makai281's full-sized avatar

Block or report makai281

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…

Jupyter Notebook 15,580 2,251 Updated Dec 12, 2024

PyTorch native finetuning library

Python 4,468 457 Updated Dec 13, 2024

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Python 1,698 382 Updated Dec 11, 2024

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Python 25,324 2,440 Updated Dec 13, 2024
Python 1,225 175 Updated Nov 20, 2024

Curate better data for LLMs

Python 984 93 Updated Mar 19, 2024

An Open-Ended Embodied Agent with Large Language Models

JavaScript 5,728 547 Updated Apr 3, 2024

leaked prompts of GPTs

28,970 3,929 Updated Sep 27, 2024

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

Python 763 46 Updated Dec 12, 2024

Code base for internal reward models and PPO training

Python 23 10 Updated Oct 1, 2023

Tools for merging pretrained large language models.

Python 4,942 458 Updated Dec 10, 2024

LlamaIndex is a data framework for your LLM applications

Python 37,331 5,356 Updated Dec 13, 2024

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 1,023 53 Updated Jan 16, 2024

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…

TypeScript 32,448 5,626 Updated Nov 29, 2024

FaceChain is a deep-learning toolchain for generating your Digital-Twin.

Jupyter Notebook 9,182 859 Updated Dec 10, 2024

Unofficial implementation of AlpaGasus

Python 86 6 Updated Sep 23, 2023

A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald.info/

Jupyter Notebook 905 72 Updated Jun 11, 2024

Official codes for ACL 2023 paper "WebCPM: Interactive Web Search for Chinese Long-form Question Answering"

HTML 987 80 Updated Nov 25, 2023

Python ProxyPool for web spider

Python 21,739 5,210 Updated Sep 10, 2024

LLM(😽)

Python 1,638 91 Updated Nov 25, 2024

搜索所有中文NLP数据集,附常用英文NLP数据集

Python 4,192 614 Updated Nov 21, 2022

Chat凉宫春日, An open sourced Role-Playing chatbot Cheng Li, Ziang Leng, and others.

Jupyter Notebook 1,857 167 Updated Aug 13, 2024

Giving the power of LLM's to a MUD lib.

Python 135 4 Updated Nov 23, 2024

Customizable implementation of the self-instruct paper.

Python 1,027 71 Updated Mar 7, 2024

Dromedary: towards helpful, ethical and reliable LLMs.

Python 1,127 87 Updated Oct 26, 2023

pCLUE: 1000000+多任务提示学习数据集

Jupyter Notebook 473 56 Updated Oct 4, 2022

Chinese Couplets Dataset without vulgar words. 不包含敏感内容的对联数据集。

69 16 Updated Dec 19, 2019

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,581 249 Updated Dec 1, 2024

Open Academic Research on Improving LLaMA to SOTA LLM

Python 1,613 104 Updated Aug 30, 2023
Next