-
CSE @ HKUST
- Hong Kong, China
- https://seanzhuh.github.io/
Highlights
- Pro
Stars
[CVPR2024] OneFormer3D: One Transformer for Unified Point Cloud Segmentation
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[ECCV 2024] The official repo for "Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing"
Personal Implementation of the paper: Nuvo: Neural UV Mapping for Unruly 3D Representations
Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation", accepted by CVPR 2024.
[ECCV 2024] Tokenize Anything via Prompting
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model
The repository for Hyperbolic Representation Learning for Computer Vision, ECCV 2022
Curated list of awesome works on unsupervised object localization in 2D images.
Pytorch implementation of Diffusion Models (https://arxiv.org/pdf/2006.11239.pdf)
[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
Official repo for our ICML 23 paper: "Multi-Modal Classifiers for Open-Vocabulary Object Detection"