Starred repositories
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
Merging information between multiple cameras to track objects of interest accurately in 3D spaces
Fully automated video maker using motion graphics and text-to-speech synthesis to turn newsletters into daily YouTube videos.
Generate TikTok-style captions with Whisper.cpp
🎥 Make videos programmatically with React
A feature-rich command-line audio/video downloader
Examples and guides for using the Gemini API
Faster Whisper transcription with CTranslate2
InstantID-ROME: Improved Identity-Preserving Generation in Seconds 🔥
Full stack, modern web application template. Using FastAPI, React, SQLModel, PostgreSQL, Docker, GitHub Actions, automatic HTTPS and more.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
🦜🔗 Build context-aware reasoning applications
Interact with your documents using the power of GPT, 100% privately, no data leaks
This is an implementation of zero-shot instance segmentation using Segment Anything.
Experiment on combining CLIP with SAM to do open-vocabulary image segmentation.
Segment Anything combined with CLIP
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking (CVPR 2023)
Segment-Anything + 3D. Let's lift anything to 3D.
We extend Segment Anything to 3D perception by combining it with VoxelNeXt.
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) fo…
Tracking and collecting papers/projects/others related to Segment Anything.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Running large language models on a single GPU for throughput-oriented scenarios.
Exporting YOLOv5 for CPU inference with ONNX and OpenVINO
Outpainting with Stable Diffusion on an infinite canvas