
Playground.Demo.mp4
- Have
Conda
andYarn
on your device - Clone or fork this repository
- Install the backend and frontend environment
sh install_playground.sh
- Review config.py to make sure the transcription device and compute type match your setup
- Run the backend
cd backend && python server.py
- In a different terminal, run the React frontend
cd interface && yarn start
This repository uses libraries based on pyannote.audio models, these models are stored in the huggingface hub. You must accept their user conditions before using them.
Note: If you don't have a huggingface account, you need to create one.
- Accept terms for the
pyannote/segmentation
model - Accept terms for the
pyannote/embedding
model - Accept terms for the
pyannote/speaker-diarization
model - Install huggingface-cli and log in with your user access token (can be found in Settings -> Access Tokens)
- Model Size: Choose the model size, from tiny to large-v2.
- Language: Select the language you will be speaking in.
- Transcription Timeout: Set the number of seconds the application will wait before transcribing the current audio data.
- Beam Size: Adjust the amount of transcriptions generated and considered, which affects accuracy and transcription generation time.
- Transcription Method: Choose "real-time" for real-time diarization and transcriptions, or "sequential" for periodic transcriptions with more context.
- On MacOS, if building the wheel for safetensors fails, install Rust
brew install rust
and try again.
- In the sequential mode, there may be uncontrolled speaker swapping.
- In real-time mode, audio data not meeting the transcription timeout won't be transcribed.
- Speechless batches will cause errors.
This repository hasn't been tested for all languages; please create an issue if you encounter any problems.
This repository and the code and model weights of Whisper are released under the MIT License.