Skip to content
View TruemanV5's full-sized avatar

Highlights

  • Pro

Block or report TruemanV5

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).

24 Updated Dec 25, 2024

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 3,122 253 Updated Nov 26, 2024

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 14,987 1,209 Updated Dec 12, 2024

Code release for "Learning Video Representations from Large Language Models"

Python 498 46 Updated Oct 1, 2023

One-click deploy of a Knowledge Graph powered RAG (GraphRAG) in Azure

Python 2,010 332 Updated Dec 19, 2024

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Python 135 13 Updated Jul 25, 2024

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,726 87 Updated Dec 12, 2024

Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"

Python 88 3 Updated Aug 6, 2024

This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)

Python 151 7 Updated Dec 5, 2024
Python 60 9 Updated Dec 16, 2024

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,109 46 Updated Dec 26, 2024

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Python 167 10 Updated Dec 25, 2024

Video datasets

1,259 96 Updated Mar 8, 2023

[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model

Python 143 4 Updated Aug 5, 2024

Transform your point cloud data into beautifully rendered 3D images.

Python 12 Updated Aug 21, 2023

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".

Python 227 9 Updated Feb 5, 2024

Code and models for "Pano3D: A Holistic Benchmark and a Solid Baseline for 360 Depth Estimation", OmniCV Workshop @ CVPR21.

Python 82 7 Updated Nov 12, 2022

ICRA2024 Paper List

471 30 Updated Sep 17, 2024
Python 20 1 Updated Aug 8, 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Python 10,070 868 Updated Jul 6, 2024

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

Python 12,906 957 Updated Dec 15, 2024

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

1,346 82 Updated Dec 16, 2024

The official implementation of "CityDreamer: Compositional Generative Model of Unbounded 3D Cities". (Xie et al., CVPR 2024)

Python 622 42 Updated Aug 31, 2024

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 36,804 4,530 Updated Dec 25, 2024

South-East Asia Large Language Models

Shell 282 21 Updated Dec 18, 2024

An offical repo for ECCV 2024 Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching

Python 56 1 Updated Oct 17, 2024

Official repo for our ECCV'24 paper: Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene.

Jupyter Notebook 31 2 Updated Sep 3, 2024

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 10,658 831 Updated Aug 20, 2024

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,484 91 Updated Dec 11, 2024

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.

Jupyter Notebook 7,239 461 Updated Nov 6, 2024
Next