Stars
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.
LAVIS - A One-stop Library for Language-Vision Intelligence
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
Codebase for Aria - an Open Multimodal Native MoE
[ECCV 2024] HiDiffusion: Increases the resolution and speed of your diffusion model by only adding a single line of code!
Scalable data pre processing and curation toolkit for LLMs
A prompting enhancement library for transformers-type text embedding systems
Huggingface-compatible SDXL Unet implementation that is readily hackable
Official implementation of "Perturbed-Attention Guidance"
"GraphAgent: Agentic Graph Language Assistant"
An open-source toolbox for fast sampling of diffusion models. Official implementations of our works published in ICML, NeurIPS, CVPR.
This project shows how to serve an ONNX-optimized image classification model as a web service with FastAPI, Docker, and Kubernetes.
EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Models (LLMs).
A PyTorch implementation of Latent Factor Analysis via Dynamical Systems (LFADS) and AutoLFADS.
Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)
Benchmark for generative image models
Makes your prompts better both Locally & Online, UI & NO UI