Skip to content

laosuan/whisper-playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

giant microphone

Whisper Playground

Instantly build real-time speech2text apps in 99 languages using faster-whisper, Diart, and Pyannote
Live demo out soon!

visitors

Playground.Demo.mp4

Setup

  1. Have Conda and Yarn on your device
  2. Clone or fork this repository
  3. Install the backend and frontend environment sh install_playground.sh
  4. Review config.py to make sure the transcription device and compute type match your setup
  5. Run the backend cd backend && python server.py
  6. In a different terminal, run the React frontend cd interface && yarn start

Parameters

  • Model Size: Choose the model size, from tiny to large-v2.
  • Language: Select the language you will be speaking in.
  • Transcription Timeout: Set the number of seconds the application will wait before transcribing the current audio data.
  • Beam Size: Adjust the amount of transcriptions generated and considered, which affects accuracy and transcription generation time.
  • Transcription Method: Choose "real-time" for real-time diarization and transcriptions, or "sequential" for periodic transcriptions with more context.

Troubleshooting

  • On MacOS, if building the wheel for safetensors fails, install Rust brew install rust and try again.

Known Bugs

  1. In the sequential mode, there may be uncontrolled speaker swapping.
  2. In real-time mode, audio data not meeting the transcription timeout won't be transcribed.
  3. Speechless batches will cause errors.

This repository hasn't been tested for all languages; please create an issue if you encounter any problems.

License

This repository and the code and model weights of Whisper are released under the MIT License.

About

Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 55.0%
  • JavaScript 39.6%
  • HTML 2.4%
  • Shell 2.2%
  • CSS 0.8%