Lists (1)
Sort Name ascending (A-Z)
Stars
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Code release for "Learning Video Representations from Large Language Models"
One-click deploy of a Knowledge Graph powered RAG (GraphRAG) in Azure
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
Transform your point cloud data into beautifully rendered 3D images.
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
Code and models for "Pano3D: A Holistic Benchmark and a Solid Baseline for 360 Depth Estimation", OmniCV Workshop @ CVPR21.
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
The official implementation of "CityDreamer: Compositional Generative Model of Unbounded 3D Cities". (Xie et al., CVPR 2024)
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
An offical repo for ECCV 2024 Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
Official repo for our ECCV'24 paper: Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene.
The official GitHub page for the survey paper "A Survey of Large Language Models".
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.