Mohammad Asim1, Christopher Wewer1, Thomas Wimmer1, 2, Bernt Schiele1, Jan Eric Lenssen1
1Max Planck Institute for Informatics, Saarland Informatics Campus, 2ETH Zurich
MEt3R evaluates the consistency between images
We introduce MEt3R a metric for multi-view consistency in generated images. Large-scale generative models for multi-view image generation are rapidly advancing the field of 3D inference from sparse observations. However, due to the nature of generative modeling, traditional reconstruction metrics are not suitable to measure the quality of generated outputs and metrics that are independent of the sampling procedure are desperately needed. In this work, we specifically address the aspect of consistency between generated multi-view images, which can be evaluated independently of the specific scene. Our approach uses DUSt3R to obtain dense 3D reconstructions from image pairs in a feed-forward manner, which are used to warp image contents from one view into the other. Then, feature maps of these images are compared to obtain a similarity score that is invariant to view-dependent effects. Using MEt3R, we evaluate the consistency of a large set of previous methods for novel view and video generation, including our open, multi-view latent diffusion model.
- Python >= 3.6
- PyTorch >= 2.1.0
- CUDA >= 11.3
- PyTorch3D >= 0.7.5
- FeatUp >= 0.1.1
NOTE: Pytorch3D and FeatUp are automatically installed alongside MEt3R.
Tested with CUDA 11.8, PyTorch 2.4.1, Python 3.10
Simply install MEt3R using the following command inside a bash terminal assuming prequisites are aleady installed and working.
pip install git+https://github.com/mohammadasim98/met3r
Simply import and use MEt3R in your codebase as follows.
import torch
from met3r import MEt3R
IMG_SIZE = 256
# Initialize MEt3R
metric = MEt3R(
img_size=IMG_SIZE, # Default. Set to `None` to use the input resolution on the fly!
use_norm=True, # Default
feat_backbone="dino16", # Default
featup_weights="mhamilton723/FeatUp", # Default
dust3r_weights="naver/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric", # Default
use_mast3r_dust3r=True # Default. Set to `False` to use original DUSt3R. Make sure to also set the correct weights from huggingface.
).cuda()
# Prepare inputs of shape (batch, views, channels, height, width): views must be 2
# RGB range must be in [-1, 1]
# Reduce the batch size in case of CUDA OOM
inputs = torch.randn((10, 2, 3, IMG_SIZE, IMG_SIZE)).cuda()
inputs = inputs.clip(-1, 1)
# Evaluate MEt3R
score, *_ = metric(
images=inputs,
return_overlap_mask=False, # Default
return_score_map=False, # Default
return_projections=False # Default
)
# Should be between 0.25 - 0.35
print(score.mean().item())
# Clear up GPU memory
torch.cuda.empty_cache()
Checkout example.ipynb
for more demo examples!
Additionally MEt3R can also be installed manually in a local development environment.
pip install -r requirements.txt
MEt3R relies on FeatUp to generate high resolution feature maps for the input images. Install FeatUp using the following command.
pip install git+https://github.com/mhamilton723/FeatUp
Refer to FeatUp for more details.
MEt3R requires Pytorch3D to perform point projection and rasterization. Install it via the following command.
pip install git+https://github.com/facebookresearch/pytorch3d.git
In case of issues related to installing and building Pytorch3D, refer to Pytorch3d for more details.
At the core of MEt3R lies DUSt3R which is used to generate the 3D point maps for feature unprojection and rasterization. We adopt DUSt3R as a submodule which can be downloaded as follows:
git submodule update --init --recursive
When using MEt3R in your project, consider citing our work as follows.
@misc{asim24met3r,
title = {MEt3R: Measuring Multi-View Consistency in Generated Images},
author = {Asim, Mohammad and Wewer, Christopher and Wimmer, Thomas and Schiele, Bernt and Lenssen, Jan Eric},
booktitle = {arXiv},
year = {2024},
}