A project for 2023-Fall SNU NLP lecture (Team 5)
This codebase is built on Whisper and WhisperBiasing
Utillize the information from the LM outputs as follows: (AM for Audio Model, LM for Language Model)
In this project, we used Whisper-base.en
for AM, and GPT-2-small
for LM.
Give LM few-shot examples to 1) provide LM with the following context and 2) leverage in-context learning ability of LM.
- Generate first
$K$ tokens with Few-shot Prompted Shallow Fusion - Generate the remaining tokens with naive shallow fusion
Used beam_size=5,
Methods | test-clean (WER) | test-other (WER) | Average |
---|---|---|---|
Whisper | 4.35 | 9.42 | 6.89 |
Shallow Fusion | 3.97 | 9.46 | 6.72 |
Few-shot Prompted Shallow Fusion | 4.26 | 9.42 | 6.84 |
Combined Shallow Fusion | 4.0 | 9.23 | 6.62 |
pip install -r requirements.txt
Example command :
python3 decode_librispeech.py \
--batch_size 1 \
--beam_size 5 \
--split $YOUR_SPLIT \
--output_path $YOUR_OUTPUT_PATH \
--cache_root $YOUR_CACHE_ROOT \
Use filter_csv.sh
.
In this file, replace input csv path, output csv path to yours and select filtering parameter min_wer, max_wer.
If you want to retrieve examples using similarity search, you have to build your faiss index by calling search_sentence.py
.
Example command :
python3 decode_librispeech.py \
--batch_size 1 \
--beam_size 5 \
--use_gpt \
--gpt_kind gpt2 \
--shallow_fusion \
--use_icl \
--sample_random \
--split $YOUR_SPLIT \
--output_path $YOUR_OUTPUT_PATH \
--cache_root $YOUR_CACHE_ROOT \
--whisper_model
: Kind of whisper model. Default is "base.en"
--split
: The dataset type. Default is "test-clean"
--use_gpt2
: Select whether to use lm or not. Default is "False"
--gpt_kind
: The model of lm. Default is "gpt2"
--lm_weight
: The weight of shallow fusion. Default is "0.05"
--ilm_weight
: The weight of internal lm weight. In our project this argument should be 0. Default is "0"
--shallow_fusion
: Select whether to use shallow fusion. Default is "False"
--batch_size
: Batch size. Default is "1"
--beam_size
: Depth of beam search.. Default is "5"
--cache_root
: Cache_root.
--num_data
: Number of datas to decode. When this argument is "-1", decode all datas. Default is "-1"
--dataset_offset
: Dataset offset. Default is "0"
--use_icl
: Select whether to use few-shot prompting. Default is "False"
--index_path
: Path of index file.
--csv_path
: Path of csv file.
--output_path
: Path of output file.
--num_examples
: Number of prompt examples. Default is "10"
--sample_random
: Select whether to create prompt randomly or not. Default is "False"
--prefix_lenght
: Lenght of prefix tokens. Default is "3"