Stars
Make websites accessible for AI agents
🐫 CAMEL: Finding the Scaling Law of Agents. The first and the best multi-agent framework. https://www.camel-ai.org
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
No fortress, purely open ground. OpenManus is Coming.
Medical o1, Towards medical complex reasoning with LLMs
HuatuoGPT, Towards Taming Language Models To Be a Doctor. (An Open Medical GPT)
Medical NLP Competition, dataset, large models, paper
Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’
[IEEE TIP] TOPIC: A Parallel Association Paradigm for Multi-Object Tracking under Complex Motions and Diverse Scenes
Official Repository of paper OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Build multimodal language agents for fast prototype and production
Solve Visual Understanding with Reinforced VLMs
[KDD2025] Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective
Build Multimodal AI Agents with memory, knowledge and tools. Simple, fast and model-agnostic.
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
Ola: Pushing the Frontiers of Omni-Modal Language Model
Fully open reproduction of DeepSeek-R1
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
Witness the aha moment of VLM with less than $3.
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
😎丰富生态、🧩支持扩展、🦄多模态 - 大模型原生即时通信机器人平台 | 适配 QQ / 微信(企业微信、个人微信)/ 飞书 / 钉钉 / Discord / Telegram 等消息平台 | 支持 ChatGPT、DeepSeek、Dify、Claude、Gemini、xAI Grok、Ollama、LM Studio、阿里云百炼、火山方舟、SiliconFlow、Qwen、Moonshot…
Frontier Multimodal Foundation Models for Image and Video Understanding