Stars
Open source conversation framework and visual editor for structured Pipecat dialogues
Unified framework for building enterprise RAG pipelines with small, specialized models
Code for visualizing the loss landscape of neural nets
Open Source framework for voice and multimodal conversational AI
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks
pipreqs - Generate pip requirements.txt file based on imports of any project. Looking for maintainers to move this project forward.
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Speech recognition module for Python, supporting several engines and APIs, online and offline.
PyTorch code and models for V-JEPA self-supervised learning from video.
Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive arch…
Efficient Multimodal Large Language Models: A Survey
LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation
Library for fast text representation and classification.
Real time speech to text transcription app.
Real time transcription with OpenAI Whisper.
Official repository of the 1st place solution for the 7th NVIDIA AI City Challenge (2023) Track 1: Multi-Camera People Tracking
Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice with Mimic2
Simple GUI application to help record audio dictated from given text prompts, for use with training speech recognition or speech synthesis.
The world's simplest facial recognition api for Python and the command line
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
⚡ Finetune Wa2vec 2.0 For Speech Recognition
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Robust Speech Recognition via Large-Scale Weak Supervision
This repository contains the code for the speech recognition in python
Pretrained Pytorch face detection (MTCNN) and facial recognition (InceptionResnet) models