-
Huazhong University of Science and Technology
- Wuhan, China
Stars
A generative world for general-purpose robotics & embodied AI learning.
Liquid: Language Models are Scalable Multi-modal Generators
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator
Official repo for "VisionZip: Longer is Better but Not Necessary in Vision Language Models"
A taxonomy of industrial anomaly detection methods and datasets (updating).
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Train transformer language models with reinforcement learning.
LLaMA 2 implemented from scratch in PyTorch
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
[Arxiv] Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes the Lead.
[CVPR 2023] Unofficial re-implementation of "WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation".
[IEEE TII 2023] Collaborative Discrepancy Optimization for Reliable Image Anomaly Localization
Official implementation of "Segment Any Anomaly without Training via Hybrid Prompt Regularization (SAA+)".
[ECCV2024] The Official Implementation for ''AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection''
Refine high-quality datasets and visual AI models
[CVPR'24 Oral] Official repository of Point Transformer V3 (PTv3)
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
Using SparseInst as a Detector for Video Instance Segmentation
ICCV'2023 | CTVIS: Consistent Training for Online Video Instance Segmentation
DROID Policy Learning and Evaluation
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks