Stars
Robust Speech Recognition via Large-Scale Weak Supervision
All public course material for STAT 88 used in Spring 2021
Python bindings for FFmpeg - with complex filtering support
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
collaborative audio module for fast.ai
New egocentric synthetic dataset for egocentric 3D human pose estimation
Efficient 3D human pose estimation in video using 2D keypoint trajectories
DeepFocus: Learned Image Synthesis for Computational Displays
Neural Reconstruction for Foveated Rendering and Video Compression using Learned Statistics of Natural Videos
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
speaker diarization by uis-rnn and speaker embedding by vgg-speaker-recognition
End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.
Python implementation of algorithms from Russell And Norvig's "Artificial Intelligence - A Modern Approach"
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)