Skip to content
View Zhihan-Zhou's full-sized avatar
  • Shanghai Jiao Tong University

Block or report Zhihan-Zhou

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 64 9 Updated Mar 29, 2019

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

Python 2,120 269 Updated Jan 10, 2025

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

592 20 Updated Dec 23, 2024

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Python 250 26 Updated Feb 25, 2025
Python 14 1 Updated Jul 19, 2024

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 2,128 211 Updated Feb 24, 2025

A Framework of Small-scale Large Multimodal Models

Python 751 81 Updated Jan 28, 2025
Python 3,443 316 Updated Feb 24, 2025

Benchmarking Generative Models with Artworks

Python 226 9 Updated Oct 29, 2022
Python 80 14 Updated Aug 14, 2024

Official implementation for 'Class-Balancing Diffusion Models'

Python 50 5 Updated May 17, 2024

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 6,862 615 Updated May 31, 2024

Code release for the paper Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

Python 29 3 Updated Jul 13, 2022

CLIP-like model evaluation

Jupyter Notebook 664 85 Updated Feb 18, 2025

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 3,910 347 Updated Aug 7, 2024

Collection of AWESOME vision-language models for vision tasks

2,527 199 Updated Dec 3, 2024

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

Python 1,353 61 Updated Dec 10, 2024
Python 368 58 Updated Dec 12, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,510 419 Updated Aug 7, 2024

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 17,027 1,410 Updated Feb 1, 2025

An Open-source Toolkit for LLM Development

Python 2,759 175 Updated Jan 13, 2025

A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)

1,109 58 Updated Jan 4, 2024

A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''

1,254 57 Updated Mar 14, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 21,585 2,371 Updated Aug 12, 2024
Python 53 3 Updated Aug 9, 2023

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Python 1,544 150 Updated Feb 24, 2025

Reading list for research topics in multimodal machine learning

6,274 868 Updated Aug 20, 2024

This repo covers the implementation for Labelling unlabelled videos from scratch with multi-modal self-supervision, which learns clusters from multi-modal data in a self-supervised way.

Python 115 15 Updated Apr 26, 2021

[T-PAMI] A curated list of self-supervised multimodal learning resources.

244 7 Updated Aug 16, 2024

✨✨Latest Advances on Multimodal Large Language Models

13,993 895 Updated Feb 25, 2025
Next