Stars
Includes the code for training and testing the CountGD model from the paper CountGD: Multi-Modal Open-World Counting.
This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement 💥💥💥
[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Unofficial libyuv mirror. Please submit any issues or PRs upstream.
a naive example of LivePortrait infer by ncnn
Bring portraits to life in Real Time!onnx/tensorrt support!实时肖像驱动!
Fast running Live Portrait with TensorRT and ONNX models
⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation (AAAI 2025)
字节跳动 穿山甲广告SDK Bytedance-UnionAD flutter版本插件
Run ComfyUI workflows on multiple local GPUs/networked machines.
[TCSVT 2024] The official repo for "End-to-End Human Instance Matting"
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
Official repo of our paper "SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions"
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
👦 Human head semantic segmentation
The collection of pre-trained, state-of-the-art AI models for ailia SDK
A collection of ComfyUI custom nodes.- Awesome smart way to work with nodes!
This node is mainly based on the Yolov8 model for object detection, and it outputs related images, masks, and JSON information.
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)