Stars
AI-driven Yu-Gi-Oh! bot using deep reinforcement learning and LLMs
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Enable macOS HiDPI and have a native setting.
official implementation for ECCV 2024 paper "Prioritized Semantic Learning for Zero-shot Instance Navigation"
MambaOut: Do We Really Need Mamba for Vision?
This is the source code for Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy (ICLR2024).
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest