-
Tsinghua University
- Beijing
- https://menghaoguo.github.io/
- @MenghaoGuo1
Stars
Jittor implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs
PyTorch implementation of the paper : Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation.
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
[SIGGRAPH'24] CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
JDiffusion is a diffusion model library for generating images or videos based on Diffusers and Jittor.
The official implementation of Self-Play Fine-Tuning (SPIN)
A collection of papers on diffusion models for 3D generation.
This repository contains a collection of papers and resources on Reasoning in Large Language Models.
Official implementation of ICCV2023 VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
My continuously updated Machine Learning, Probabilistic Models and Deep Learning notes and demos (2000+ slides) 我不间断更新的机器学习,概率模型和深度学习的讲义(2000+页)和视频链接
OpenLLaMA-Chinese, a permissively licensed open source instruction-following models based on OpenLLaMA
ImageBind One Embedding Space to Bind Them All
A unified framework for 3D content generation.
Official implementations for "Long Range Pooling for 3D Large-Scale Scene Understanding" (CVPR 2023)
[T-PAMI-2024] Transformer-Based Visual Segmentation: A Survey
Experiment on combining CLIP with SAM to do open-vocabulary image segmentation.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.