Starred repositories
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
Tools to download and cleanup Common Crawl data
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Convert PDF to markdown + JSON quickly with high accuracy
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowe…
This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open sourced AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SL…
llama3 implementation one matrix multiplication at a time
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
llama3.np is a pure NumPy implementation for Llama 3 model.
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
[ICML 2024] Selecting High-Quality Data for Training Language Models
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2…
Open-Sora: Democratizing Efficient Video Production for All