-
Mohamed bin Zayed University of Artificial Intelligence
- Abu Dhabi
- https://scholar.google.com/citations?user=LJWxVpUAAAAJ&hl=en
- in/sahalshajim
Stars
Extracts essential Mediapipe face landmarks and arranges them in a sequenced order.
Methods I used to clean the Micro_Expressions dataset
Realtime micro-expression recognition using OpenCV and PyTorch
Template files for workshop "NLP FOR VISION AND SPEECH IMPAIRED" by IIT Madras Research Park | Empower Conference
k-m-irfan / Fastspeech2_HS
Forked from smtiitm/Fastspeech2_MFAIndic TTS for Indian Languages: This is a project on developing text-to-speech (TTS) synthesis systems for Indian languages, improving quality of synthesis, as well as small foot print TTS integrat…
Flask API implementation of the Text to Speech Model developed my Speech Lab, IIT Madras
An Intuitive Humanoid Robot Arm Controller for Teleoperation.
Official implementation of paper titled "GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model"
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Code for "Enhancing In-context Learning via Linear Probe Calibration"
[WACV 2025] Efficient Video Object Segmentation via Modulated Cross-Attention Memory
MobiLlama : Small Language Model tailored for edge devices
(WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
[BIONLP@ACL 2024] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.
[EMNLP'23] ClimateGPT: a specialized LLM for conversations related to Climate Change and Sustainability topics in both English and Arabic languages.
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
[CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".
[NeurIPS 2022] Official repository of paper titled "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection".
[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
An Empirical Study Of Self-supervised Learning Approaches For Object Detection With Transformers - CV703 Course Project
Hotel Booking Concept is a promo sample application inspired by