This repository contains a simple Flask API for transcribing audio files using the Whisper model.
whisper-transcribe-cli.py
is a command-line interface (CLI) tool to transcribe audio files using the Whisper model. This script allows you to transcribe audio files and save the transcriptions in JSON format or print them to the console.
- Python 3.9+
whisper
libraryargparse
library
- Install the required libraries:
pip install whisper
- Clone this repository or download the script.
The script can be run from the command line with various options to specify the input file, model size, transcription task, verbosity, and output format. Below are the details of the available arguments.
file_path
(required): Path to the audio file to be transcribed.--model
: Size of the Whisper model to use (default:tiny
).--task
: Type of task (translate
(default) |transcribe
).--verbose
: Enable verbose mode (default:True
).--output_format
: Format of the output (json
(default)).--output_path
: Path to save the output file. If not specified, the file will be saved in theoutput_dir
.--output_dir
: Directory to save the output file (default:./res
).
Transcribe an audio file and save the result in JSON format to the default directory.
python whisper-transcribe-cli.py samples/dragons.wav
Transcribe using a different model size.
python whisper-transcribe-cli.py samples/sofiavergaraspanish.clip.wav --model base
Change the task to transcribe
.
python whisper-transcribe-cli.py samples/interview_speech-analytics.wav --task transcribe
Run the script without verbose output.
python whisper-transcribe-cli.py samples/dragons.wav --verbose False
Print the transcription text to the console instead of saving it as a JSON file.
python whisper-transcribe-cli.py samples/sofiavergaraspanish.clip.wav --output_format text
Save the transcription to a specific file path.
python whisper-transcribe-cli.py samples/interview_speech-analytics.wav --output_path ./results/interview_transcription.json
Specify a different directory to save the transcription.
python whisper-transcribe-cli.py samples/dragons.wav --output_dir ./output
Here are some example commands using files from the samples
directory:
python whisper-transcribe-cli.py samples/dragons.wav
python whisper-transcribe-cli.py samples/sofiavergaraspanish.clip.wav --model small --task transcribe
python whisper-transcribe-cli.py samples/interview_speech-analytics.wav --verbose False --output_format text
python whisper-transcribe-cli.py samples/dragons.wav --output_path ./results/dragons_transcription.json
python whisper-transcribe-cli.py samples/sofiavergaraspanish.clip.wav --output_dir ./transcriptions
Feel free to modify these commands as per your requirements.
app.py
is a Flask-based API that provides an interface to upload audio files and transcribe them using the Whisper model. This script allows you to interact with the Whisper transcription functionality via HTTP requests.
- Python 3.9+
flask
librarywhisper
library (and its dependencies)werkzeug
library (for secure file saving)
- Install the required libraries:
pip install flask whisper werkzeug
- Clone this repository or download the script.
The Flask API can be started from the command line and provides endpoints to interact with the Whisper transcription functionality.
To start the Flask API, run the following command:
python app.py
The API will be accessible at http://0.0.0.0:5678
.
This endpoint returns a simple message to verify that the API is running.
Request:
GET /
Response:
{
"message": "Hello from API on port 5678!"
}
This endpoint allows you to upload an audio file for transcription. The file is saved to the uploads
directory, and the transcription is performed using the Whisper model.
Request:
POST /upload
Content-Type: multipart/form-data
Form Data:
file
: The audio file to be transcribed.
Response:
The transcription result is returned in the response. The format depends on the implementation of the whisper_transcribe_fn
.
Here is an example using curl
to upload a file for transcription:
curl -X POST -F "file=@path_to_your_audio_file.wav" http://0.0.0.0:5678/upload
app.py
: The main Flask application script.whisper_transcribe_fn.py
: This script should contain the functionwhisper_transcribe_fn
which handles the transcription logic using the Whisper model.uploads/
: Directory where uploaded files are saved.
This script sets up the Flask application with two endpoints: /
for a simple health check and /upload
for uploading files and performing transcriptions. It ensures the uploads
directory exists and saves uploaded files securely.
Ensure you have a whisper_transcribe_fn.py
file with the following function defined:
import whisper
def whisper_transcribe_fn(file_path):
# Load the Whisper model
model = whisper.load_model("tiny")
# Perform the transcription
result = model.transcribe(file_path)
# Return the transcription result (modify as needed)
return result["text"]
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please open an issue or submit a pull request for any changes or improvements.