Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy in SIGMOS results #3

Open
mgrzywa-logi opened this issue Jun 26, 2024 · 0 comments
Open

Discrepancy in SIGMOS results #3

mgrzywa-logi opened this issue Jun 26, 2024 · 0 comments

Comments

@mgrzywa-logi
Copy link

mgrzywa-logi commented Jun 26, 2024

I wanted to ask about discrepancy in SIGMOS results based on a simple experiment I took recently.

I got a clear speech data from ETSI 103 106 standard (Annex C) - 4 talkers (2 Males/2 Females), 20 sentences, each sample is 4 s duration.
Document: https://www.etsi.org/deliver/etsi_ts/103100_103199/103106/01.06.01_60/ts_103106v010601p.pdf
Audio file:
https://docbox.etsi.org/stq/Open/TS%20103%20106%20Wave%20files/Annex_C_Dynastat%20Speech%20Data/Dyna-Src_P835_16_sentences_4convergence.wav

I cut this audio file for 20 separated, 4 seconds files per every utterance and calculated SIGMOS for every file and for the original 80 seconds file.
What I expected is that the average results of very single utterance will correlate with a result of a concatenated file, but actually they differ as presented below:

<style> </style>
FILENAME 1 COL DISC LOUD NOISE REVERB SIG OVRL COMMENTS
convergance_utt_1 3,93 4,41 4,36 4,61 4,66 4,22 3,87 male
convergance_utt_2 3,78 4,71 3,58 4,45 4,71 3,94 3,75 female
convergance_utt_3 4,08 4,60 4,27 4,54 4,66 4,31 4,05 male
convergance_utt_4 3,75 4,32 3,85 4,44 4,75 3,91 3,46 female
utt_1 4,04 4,46 4,06 4,61 4,56 3,81 3,55 male
utt_10 3,68 4,87 4,24 4,70 4,93 4,40 3,99 female
utt_11 3,64 4,44 3,59 4,32 4,59 4,04 3,61 male
utt_12 4,30 4,42 3,96 4,53 4,79 4,18 3,70 female
utt_13 3,65 4,31 4,08 4,79 4,63 3,43 3,17 male
utt_14 3,69 4,62 3,68 4,00 4,77 3,83 3,49 female
utt_15 3,65 3,52 3,77 4,37 4,52 3,36 3,05 male
utt_16 3,59 4,33 4,24 4,17 4,55 3,69 3,36 female
utt_2 3,72 4,21 3,59 4,54 4,53 3,49 3,22 female
utt_3 3,91 4,91 4,21 4,61 4,88 4,06 3,78 male
utt_4 3,88 4,46 3,52 4,48 4,82 4,17 3,74 female
utt_5 3,67 4,43 4,33 4,67 4,68 3,77 3,36 male
utt_6 3,10 4,32 4,22 4,46 4,82 3,41 2,99 female
utt_7 4,02 4,71 3,30 4,71 4,59 4,54 3,97 male
utt_8 3,83 4,85 4,11 4,57 4,76 3,71 3,57 female
utt_9 4,03 4,88 4,31 4,70 4,44 4,10 3,90 male
Dyna-Src_P835_16_sentences_4convergence 4,22 4,47 4,50 4,78 4,95 4,25 4,05  
<style> </style>
  COL DISC LOUD NOISE REVERB SIG OVRL
avgerage 3,797 4,489 3,963 4,515 4,682 3,918 3,578
original 4,223 4,474 4,498 4,780 4,952 4,248 4,046
diff 0,426 -0,015 0,536 0,265 0,269 0,330 0,468
std 0,249 0,315 0,326 0,191 0,132 0,339 0,315

As you can see, the differences can equals almost 0,5 MOS in some cases.

I wonder what can be the reason of this discrepancy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant