starsky0426

Follow

Starsky0426 starsky0426

Follow

4 followers · 19 following

Stars

Genesis-Embodied-AI / Genesis

A generative world for general-purpose robotics & embodied AI learning.

Python 17,387 1,219 Updated Dec 22, 2024

lsl001006 / ZONE

CVPR-24 | Official codebase for ZONE: Zero-shot InstructiON-guided Local Editing

Python 68 1 Updated Nov 21, 2024

Nerogar / OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.

Python 1,867 158 Updated Dec 22, 2024

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 8,569 832 Updated Dec 18, 2024

deepseek-ai / DeepSeek-VL2

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 478 20 Updated Dec 18, 2024

Jonseed / ComfyUI-Detail-Daemon

A port of muerrilla's sd-webui-Detail-Daemon as a node for ComfyUI, to adjust sigmas that control detail.

Python 432 12 Updated Nov 4, 2024

kukaiN / vae_finetune

code for finetuning vae

Python 16 Updated Sep 8, 2024

jjmlovesgit / ChatGPT-Advanced-Voice-Mode

ChatGPT Advanced Voice Mode Gets an Avatar!

JavaScript 8 5 Updated Sep 29, 2024

cccntu / fine-tune-models

Jupyter Notebook 112 12 Updated Sep 11, 2022

Huanshere / VideoLingo

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音，一键全自动视频搬运AI字幕组

Python 8,610 832 Updated Dec 20, 2024

open-mmlab / PowerPaint

[ECCV 2024] PowerPaint, a versatile image inpainting model that supports text-guided object inpainting, object removal, image outpainting and shape-guided object inpainting with only a single model…

Python 721 45 Updated Sep 8, 2024

FoundationVision / VAR

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…

Jupyter Notebook 6,370 427 Updated Dec 22, 2024

viiika / HumanEdit

Official Implementation of HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

Python 20 Updated Dec 6, 2024

hzlsaber / SIDA

The offical repository of "SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model"

7 Updated Dec 6, 2024

kijai / ComfyUI-HunyuanVideoWrapper

Python 1,192 77 Updated Dec 22, 2024

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 713 35 Updated Dec 21, 2024

weichaozeng / TextCtrl

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control

Python 36 3 Updated Dec 16, 2024

frankchieng / ComfyUI_MagicClothing

unofficial implementation of Comfyui magic clothing

Python 530 44 Updated Sep 4, 2024

Tencent / HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 6,486 479 Updated Dec 19, 2024

replicate / cog-flux

Cog inference for flux models

Python 308 35 Updated Dec 21, 2024

Kunhao-Liu / ViewExtrapolator

[arXiv 2024] Novel View Extrapolation with Video Diffusion Priors

Python 87 2 Updated Dec 12, 2024

unslothai / unsloth

Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory

Python 19,553 1,374 Updated Dec 21, 2024

TencentQQGYLab / ELLA

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Python 1,108 58 Updated Jul 17, 2024

liujunwen23 / MIRE

WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge

Python 110 11 Updated Nov 11, 2024

XiangZ-0 / HiT-SR

[ECCV 2024 - Oral] HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

Python 93 2 Updated Nov 14, 2024

pot-app / pot-desktop

🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.

JavaScript 10,859 486 Updated Nov 16, 2024

megvii-research / NAFNet

The state-of-the-art image restoration model without nonlinear activation functions.

Python 2,298 291 Updated Jul 3, 2024

microsoft / TinyTroupe

LLM-powered multiagent persona simulation for imagination enhancement and business insights.

Python 5,076 390 Updated Dec 17, 2024

real-stanford / diffusion_policy

[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion

Python 1,788 344 Updated Nov 19, 2024

ParthaEth / GIF

GIF is a photorealistic generative face model with explicit 3D geometric and photometric control.

Python 409 63 Updated Sep 13, 2022