Method Overview

A project for 2023-Fall SNU NLP lecture (Team 5)

This codebase is built on Whisper and WhisperBiasing

Method Overview

Shallow Fusion

Utillize the information from the LM outputs as follows: (AM for Audio Model, LM for Language Model)

$$\text{score} = log P_{\text{AM}}(Y|X) + \lambda{}\cdot{}logP_{\text{LM}}(Y)$$

In this project, we used Whisper-base.en for AM, and GPT-2-small for LM.

Few-shot Prompted Shallow Fusion (Proposed Method 1)

Give LM few-shot examples to 1) provide LM with the following context and 2) leverage in-context learning ability of LM.

$$\text{score} = log P_{\text{AM}}(Y|X) + \lambda{}\cdot{}logP_{\text{LM}}(Y | \text{few-shot prompt})$$

Combined Shallow Fusion (Proposed Method 2)

Generate first $K$ tokens with Few-shot Prompted Shallow Fusion
Generate the remaining tokens with naive shallow fusion

$$ \text{score} = \begin{cases} log P_{\text{AM}}(Y|X) + \lambda{}\cdot{}logP_{\text{LM}}(Y | \text{few-shot prompt}), & \text{if $\text{len}(Y) \leq{} K$,} \\ log P_{\text{AM}}(Y|X) + \lambda{}\cdot{}logP_{\text{LM}}(Y), & \text{otherwise,} \end{cases} $$

Experimental Results on LibriSpeech

Used beam_size=5, $\lambda{}$=0.05 in all experiments. Random retrieval is used for few-shot prompting.

Methods	test-clean (WER)	test-other (WER)	Average
Whisper	4.35	9.42	6.89
Shallow Fusion	3.97	9.46	6.72
Few-shot Prompted Shallow Fusion	4.26	9.42	6.84
Combined Shallow Fusion	4.0	9.23	6.62

How to run our codes

Install Dependencies

pip install -r requirements.txt

Example

1. Generate data pool

Example command :

python3 decode_librispeech.py \
    --batch_size 1 \
    --beam_size 5 \
    --split $YOUR_SPLIT \
    --output_path $YOUR_OUTPUT_PATH \
    --cache_root $YOUR_CACHE_ROOT \

2. Filter the data pool(Optional, Recommended)

Use filter_csv.sh. In this file, replace input csv path, output csv path to yours and select filtering parameter min_wer, max_wer.

3. Generate vector DB(Optional)

If you want to retrieve examples using similarity search, you have to build your faiss index by calling search_sentence.py.

4. Inference

Example command :

python3 decode_librispeech.py \
    --batch_size 1 \
    --beam_size 5 \
    --use_gpt \
    --gpt_kind gpt2 \
    --shallow_fusion \
    --use_icl \
    --sample_random \
    --split $YOUR_SPLIT \
    --output_path $YOUR_OUTPUT_PATH \
    --cache_root $YOUR_CACHE_ROOT \

Arguments

--whisper_model : Kind of whisper model. Default is "base.en"
--split : The dataset type. Default is "test-clean"
--use_gpt2 : Select whether to use lm or not. Default is "False"
--gpt_kind : The model of lm. Default is "gpt2"
--lm_weight : The weight of shallow fusion. Default is "0.05"
--ilm_weight : The weight of internal lm weight. In our project this argument should be 0. Default is "0"
--shallow_fusion : Select whether to use shallow fusion. Default is "False"
--batch_size : Batch size. Default is "1"
--beam_size : Depth of beam search.. Default is "5"
--cache_root : Cache_root.
--num_data : Number of datas to decode. When this argument is "-1", decode all datas. Default is "-1"
--dataset_offset : Dataset offset. Default is "0"
--use_icl : Select whether to use few-shot prompting. Default is "False"
--index_path : Path of index file.
--csv_path : Path of csv file.
--output_path : Path of output file.
--num_examples : Number of prompt examples. Default is "10"
--sample_random : Select whether to create prompt randomly or not. Default is "False"
--prefix_lenght : Lenght of prefix tokens. Default is "3"

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
kube_jobs		kube_jobs
librispeech-pc		librispeech-pc
playground		playground
whisper		whisper
.gitignore		.gitignore
README.md		README.md
decode_librispeech.py		decode_librispeech.py
filter_csv.py		filter_csv.py
filter_csv.sh		filter_csv.sh
generate_prompt.py		generate_prompt.py
requirements.txt		requirements.txt
run_icl.sh		run_icl.sh
search_sentence.py		search_sentence.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Method Overview

Shallow Fusion

Few-shot Prompted Shallow Fusion (Proposed Method 1)

Combined Shallow Fusion (Proposed Method 2)

Experimental Results on LibriSpeech

How to run our codes

Install Dependencies

Example

1. Generate data pool

2. Filter the data pool(Optional, Recommended)

3. Generate vector DB(Optional)

4. Inference

Arguments

About

Releases

Packages

Contributors 3

Languages

DHdroid/nlp-improving-asr

Folders and files

Latest commit

History

Repository files navigation

Method Overview

Shallow Fusion

Few-shot Prompted Shallow Fusion (Proposed Method 1)

Combined Shallow Fusion (Proposed Method 2)

Experimental Results on LibriSpeech

How to run our codes

Install Dependencies

Example

1. Generate data pool

2. Filter the data pool(Optional, Recommended)

3. Generate vector DB(Optional)

4. Inference

Arguments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages