mmaaz60

😀

Muhammad Maaz mmaaz60

😀

An Electrical Engineer with experience in Computer Vision software development. Skilled in Machine Learning, Deep Learning and Computer Vision.

139 followers · 4 following

Achievements

x2 x2

Achievements

x2 x2

Organizations

Lists (1)

Sort

🔮 Future ideas

1 repository

Stars

mbzuai-oryx / VideoGLaMM

A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

37 Updated Dec 13, 2024

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 13,301 1,271 Updated Dec 25, 2024

Amshaker / GroupMamba

Official implementation of paper titled "GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model"

Python 64 4 Updated Jul 19, 2024

mbzuai-oryx / VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Python 234 15 Updated Aug 11, 2024

mbzuai-oryx / LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Python 819 61 Updated Jul 10, 2024

TencentARC / ST-LLM

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"

Python 133 4 Updated Sep 10, 2024

BioMedIA-MBZUAI / MedPromptX

Jupyter Notebook 59 2 Updated Aug 27, 2024

OmkarThawakar / composed-video-retrieval

Composed Video Retrieval

Python 48 Updated May 2, 2024

Amshaker / MAVOS

[WACV 2025] Efficient Video Object Segmentation via Modulated Cross-Attention Memory

Python 52 2 Updated Sep 3, 2024

mbzuai-oryx / MobiLlama

MobiLlama : Small Language Model tailored for edge devices

Python 615 47 Updated Mar 3, 2024

mbzuai-oryx / PALO

(WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.

Python 81 5 Updated Sep 10, 2024

TRI-ML / vlm-evaluation

VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning

Python 92 10 Updated Sep 17, 2024

UMass-Foundation-Model / MultiPLY

Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

Python 123 6 Updated Oct 24, 2024

mbzuai-oryx / GeoChat

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing

Python 476 38 Updated Nov 28, 2024

mbzuai-oryx / Awesome-CV-Foundational-Models

Forked from awaisrauf/Awesome-CV-Foundational-Models

7 1 Updated Jul 31, 2023

mbzuai-oryx / Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Python 248 11 Updated Jan 2, 2024

jameelhassan / PromptAlign

[NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

Python 98 11 Updated Feb 11, 2024

akhtarvision / cal-detr

Python 40 5 Updated Nov 9, 2023

mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 799 38 Updated Nov 23, 2024

hananshafi / llmblueprint

[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"

Jupyter Notebook 69 3 Updated May 18, 2024

magic-research / bubogpt

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

Python 506 35 Updated Jul 21, 2023

rese1f / MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Python 555 42 Updated Dec 18, 2024

marslanm / Multimodality-Representation-Learning

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://…

69 7 Updated Oct 19, 2023

asif-hanif / vafa

[MICCAI 2023] Official code repository of paper titled "Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation" accepted in MICCAI 2023 conference.

Python 51 Updated Nov 14, 2023

muzairkhattak / PromptSRC

[ICCV'23 Main Track, WECIA'23 Oral] Official repository of paper titled "Self-regulating Prompts: Foundational Model Adaptation without Forgetting".

Python 243 10 Updated Sep 28, 2023

mbzuai-oryx / ClimateGPT

[EMNLP'23] ClimateGPT: a specialized LLM for conversations related to Climate Change and Sustainability topics in both English and Arabic languages.

Python 78 10 Updated Sep 24, 2024

mbzuai-oryx / XrayGPT

[BIONLP@ACL 2024] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.

Python 478 57 Updated Aug 8, 2024

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…

Python 1,252 110 Updated Aug 27, 2024

Vision-CAIR / MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Python 25,511 2,923 Updated Sep 2, 2024

amazon-science / prompt-pretraining

Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"

Python 254 10 Updated May 3, 2024