Stars
Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…
Official PyTorch implementation for "Large Language Diffusion Models"
Video Generation Foundation Models: https://saiyan-world.github.io/goku/
A tool for visualizing and communicating the errors in rendered images.
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Vim plugin for LLM-assisted code/text completion
GPT4V-level open-source multi-modal model based on Llama3-8B
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Minimalistic 4D-parallelism distributed training framework for education purpose
Enchanted is iOS and macOS app for chatting with private self hosted language models such as Llama2, Mistral or Vicuna using Ollama.
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
Vim plugin for integrating Ollama based LLM (large language models)
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
first base model for full-duplex conversational audio
A MLX port of FLUX based on the Huggingface Diffusers implementation.
On-device Speech Recognition for Android
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
A Fast Deep Learning Model to Upsample Low Resolution Videos to High Resolution at 30fps
Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also …
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think (ICLR 2025)