Stars
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation.
7
Updated Dec 10, 2024
MuCR is a benchmark designed to evaluate Vision Large Language Models' (VLLMs) ability to infer causal relationships using only visual cues
[ECCV2022] D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
An easy-to-use debug print tool for deep learning projects in python. PyPi: https://pypi.org/project/pydprint/