Skip to content
View xingyizhou's full-sized avatar
🕊️
.
🕊️
.

Block or report xingyizhou

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

COYO-700M: Large-scale Image-Text Pair Dataset

Python 1,185 38 Updated Nov 30, 2022

Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 661 18 Updated Dec 30, 2024

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,360 226 Updated Dec 31, 2024

🔥 Aurora Series: A more efficient multimodal large language model series for video.

Python 61 4 Updated Nov 16, 2024

Next-Token Prediction is All You Need

Python 1,949 77 Updated Oct 24, 2024

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

Python 26,952 5,534 Updated Jan 3, 2025

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 10,104 947 Updated Jan 3, 2025

MINT-1T: A one trillion token multimodal interleaved dataset.

786 20 Updated Jul 31, 2024

Frechet inception distance (FID) evaluation in JAX

Python 12 1 Updated May 28, 2024

[ECCV 2024] Beyond MOT: Semantic Multi-Object Tracking

Python 42 Updated Nov 19, 2024

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 2,256 189 Updated Dec 30, 2024

[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"

Python 118 6 Updated Aug 21, 2024

[ECCV 2024] Beyond MOT: Semantic Multi-Object Tracking

Python 29 Updated Sep 12, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 13,428 1,297 Updated Dec 25, 2024

Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 379 32 Updated Aug 11, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,816 122 Updated Oct 30, 2024

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion

Jupyter Notebook 7,643 795 Updated Dec 8, 2022
Python 3,193 281 Updated Oct 16, 2024

[CVPR 2024 Oral] MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation.

Python 138 12 Updated Aug 1, 2024
Python 1,783 54 Updated Jun 28, 2024

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

Python 2,239 178 Updated Dec 31, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 20,966 2,305 Updated Aug 12, 2024

[arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization

Python 108 Updated Jun 12, 2024

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,432 57 Updated Aug 15, 2024

(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis

Python 601 31 Updated Sep 27, 2024

[CVPR 2024] Official implementation of "VRP-SAM: SAM with Visual Reference Prompt"

Python 112 13 Updated Sep 27, 2024

[ECCV 2024] Tokenize Anything via Prompting

Jupyter Notebook 553 23 Updated Dec 11, 2024

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 3,826 347 Updated Aug 7, 2024

The official Meta Llama 3 GitHub site

Python 27,777 3,179 Updated Aug 12, 2024
Next