Lists (2)
Sort Name ascending (A-Z)
Stars
Got Your Back (GYB) is a command line tool for backing up your Gmail messages to your computer using Gmail's API over HTTPS.
Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.
The simplest & most comprehensible tutorial on speaker identification with NVIDIA's `Nemo`.
Python package for combining diarization system outputs.
Hyperaudio Lite - a Super-lightweight Interactive Transcript Player
ez audio transcription tool with flexible processing and post-processing options
💬 ASR FastAPI server using faster-whisper and Multi-Scale Auto-Tuning Spectral Clustering for diarization.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Unofficial implementation of NVIDIA P-Flow TTS paper
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names
Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
adefossez / demucs
Forked from facebookresearch/demucsCode for the paper Hybrid Spectrogram and Waveform Source Separation
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
OpenRecall is a fully open-source, privacy-first alternative to proprietary solutions like Microsoft's Windows Recall. With OpenRecall, you can easily access your digital history, enhancing your me…
build ai agents that have the full context, open source, runs locally, developer friendly. 24/7 screen, mic, keyboard recording and control
A python package to analyze and compare voices with deep learning
turnkey self-hosted offline transcription and diarization service with llm summary
ASR + diarization model server with speculative decoding
🔊 Text-Prompted Generative Audio Model
An easy way to extract information from documents
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
MARS5 speech model (TTS) from CAMB.AI