Stars
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭…
Unified Controllable Visual Generation Model
Diffusers Implementation of Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
This is the official reproduction of FancyVideo.
AllenMao / MindSearch
Forked from InternLM/MindSearch🔍 a LLM-based Multi-agent Framework of Web Search Engine similar to Perplexity.ai Pro and SearchGPT
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.
MedViLL official code. (Published IEEE JBHI 2021)
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Learning to Cluster Faces (CVPR 2019, CVPR 2020)
PyTorch implementation of a collections of scalable Video Transformer Benchmarks.
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Interactive Map Correction for 3D Graph SLAM
✨✨Latest Advances on Multimodal Large Language Models
Universal LLM Deployment Engine with ML Compilation
Espressif deep-learning library for AIoT applications
A code for paper Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking
Production-grade 3D gaussian splatting with CPU/GPU support for Windows, Mac and Linux 🚀
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Strong and Open Vision Language Assistant for Mobile Devices
A python toolkit for parsing captions (in natural language) into scene graphs (as symbolic representations).