Stars
PyTorch implementation of Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration (NeurIPS2024 Spotlight).
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Machine Learning and Computer Vision Engineer - Technical Interview Questions
A repository to prepare you for your machine learning interview, involving most of the questions asked by all the tech giants and local companies. Do this to Ace your Machine Learning Engineer Inte…
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
Codebase for CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts
PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
Data and Code for Paper "From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models"
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
Famous Vision Language Models and Their Architectures
Curated list of data science interview questions and answers
The ORBIT dataset is a collection of videos of objects in clean and cluttered scenes recorded by people who are blind/low-vision on a mobile phone. The dataset is presented with a teachable object …
Multilingual Image Captioning Evaluation
(WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.
inference code for SOTA closed and open vision-language models
JohannesBuchner / imagehash
Forked from bunchesofdonald/photohashA Python Perceptual Image Hashing Module
A curated list of papers & resources linked to concept learning
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
A paper list of some recent Mamba-based CV works.
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine…
Track emissions from Compute and recommend ways to reduce their impact on the environment.
😸 💬 A module to compute textual lexical richness (aka lexical diversity).
A playbook for systematically maximizing the performance of deep learning models.
Code for paper: VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning
Code repository for supporting the paper "Atlas Few-shot Learning with Retrieval Augmented Language Models",(https//arxiv.org/abs/2208.03299)