Stars
A Comprehensive Benchmark for Document Parsing and Evaluation
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Train transformer language models with reinforcement learning.
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
A collection of large question answering datasets
Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。
GPT4V-level open-source multi-modal model based on Llama3-8B
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
DeepSeek Coder: Let the Code Write Itself
一个还算强大的Web思维导图。A relatively powerful web mind map.
kaldi-asr/kaldi is the official location of the Kaldi project.
Collection of datasets used for Optical Music Recognition
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
An Efficient Lexical Analyzer for Chinese