-
Shanghai Jiao Tong University
Stars
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
A Framework of Small-scale Large Multimodal Models
Benchmarking Generative Models with Artworks
Official implementation for 'Class-Balancing Diffusion Models'
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Code release for the paper Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Collection of AWESOME vision-language models for vision tasks
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
An Open-source Toolkit for LLM Development
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
Reading list for research topics in multimodal machine learning
This repo covers the implementation for Labelling unlabelled videos from scratch with multi-modal self-supervision, which learns clusters from multi-modal data in a self-supervised way.
[T-PAMI] A curated list of self-supervised multimodal learning resources.
✨✨Latest Advances on Multimodal Large Language Models