It's my implementation for speech fluency assessment model. The idea for this model is from the paper An ASR-Free Fluency Scoring Approach with Self-Supervised Learning (Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee) proposed in the ICASSP 2023.
These implementations are unofficial, and there might be some bugs that I missed.
But, the repo will complete as soon as possible.
Here shows the main structure for this repo:
The SpeechOcean762 dataset used in my work is an open dataset licenced with CC BY 4.0.
If You have downloaded speechocean762 for yourself, you can fill in your directory path to prep_data/run.sh
.
The input generation program are in prep_data
.
Just run the shell script in prep_data
.
cd prep_data
./run.sh
- The labels are fluency scores in speechocean762.
- The acoustic features are extracted by Wav2vec_large, where the dim is the value of 1024.
- The feats and labels files are collected in
data
. - The cluster model is trained in
train_kmeans.py
, the model will be saved inexp/kmeans
, which is used in fluency_scoring training later. kmeans_metric.py
is used to take a look the performance of kmeans clustering.
【Noted】: Force alignment result to replace the Kmeans predicted results
You can run the following programming if you want to try the Force alignment results for the replacement of cluster ID.
python3 gen_ctc_force_align.py
If you choose this for the resource of cluster ID, you need to update the run.sh
: make the **cluster_pred=False**
- version for no cluster_id feature:
./noclu_run.sh
- version with cluster_id feature:
./run.sh
Models | Utt FLU PCC |
---|---|
GOPT (Librispeech) | 0.756 |
Proposed paper | 0.795 |
FluScorer+cluster_idx | 0.753 |
Flu_TFR+cluster_idx | 0.790 |