You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I cut this audio file for 20 separated, 4 seconds files per every utterance and calculated SIGMOS for every file and for the original 80 seconds file.
What I expected is that the average results of very single utterance will correlate with a result of a concatenated file, but actually they differ as presented below:
<style>
</style>
FILENAME 1
COL
DISC
LOUD
NOISE
REVERB
SIG
OVRL
COMMENTS
convergance_utt_1
3,93
4,41
4,36
4,61
4,66
4,22
3,87
male
convergance_utt_2
3,78
4,71
3,58
4,45
4,71
3,94
3,75
female
convergance_utt_3
4,08
4,60
4,27
4,54
4,66
4,31
4,05
male
convergance_utt_4
3,75
4,32
3,85
4,44
4,75
3,91
3,46
female
utt_1
4,04
4,46
4,06
4,61
4,56
3,81
3,55
male
utt_10
3,68
4,87
4,24
4,70
4,93
4,40
3,99
female
utt_11
3,64
4,44
3,59
4,32
4,59
4,04
3,61
male
utt_12
4,30
4,42
3,96
4,53
4,79
4,18
3,70
female
utt_13
3,65
4,31
4,08
4,79
4,63
3,43
3,17
male
utt_14
3,69
4,62
3,68
4,00
4,77
3,83
3,49
female
utt_15
3,65
3,52
3,77
4,37
4,52
3,36
3,05
male
utt_16
3,59
4,33
4,24
4,17
4,55
3,69
3,36
female
utt_2
3,72
4,21
3,59
4,54
4,53
3,49
3,22
female
utt_3
3,91
4,91
4,21
4,61
4,88
4,06
3,78
male
utt_4
3,88
4,46
3,52
4,48
4,82
4,17
3,74
female
utt_5
3,67
4,43
4,33
4,67
4,68
3,77
3,36
male
utt_6
3,10
4,32
4,22
4,46
4,82
3,41
2,99
female
utt_7
4,02
4,71
3,30
4,71
4,59
4,54
3,97
male
utt_8
3,83
4,85
4,11
4,57
4,76
3,71
3,57
female
utt_9
4,03
4,88
4,31
4,70
4,44
4,10
3,90
male
Dyna-Src_P835_16_sentences_4convergence
4,22
4,47
4,50
4,78
4,95
4,25
4,05
<style>
</style>
COL
DISC
LOUD
NOISE
REVERB
SIG
OVRL
avgerage
3,797
4,489
3,963
4,515
4,682
3,918
3,578
original
4,223
4,474
4,498
4,780
4,952
4,248
4,046
diff
0,426
-0,015
0,536
0,265
0,269
0,330
0,468
std
0,249
0,315
0,326
0,191
0,132
0,339
0,315
As you can see, the differences can equals almost 0,5 MOS in some cases.
I wonder what can be the reason of this discrepancy.
The text was updated successfully, but these errors were encountered:
I wanted to ask about discrepancy in SIGMOS results based on a simple experiment I took recently.
I got a clear speech data from ETSI 103 106 standard (Annex C) - 4 talkers (2 Males/2 Females), 20 sentences, each sample is 4 s duration.
Document: https://www.etsi.org/deliver/etsi_ts/103100_103199/103106/01.06.01_60/ts_103106v010601p.pdf
Audio file:
https://docbox.etsi.org/stq/Open/TS%20103%20106%20Wave%20files/Annex_C_Dynastat%20Speech%20Data/Dyna-Src_P835_16_sentences_4convergence.wav
I cut this audio file for 20 separated, 4 seconds files per every utterance and calculated SIGMOS for every file and for the original 80 seconds file.
<style> </style>What I expected is that the average results of very single utterance will correlate with a result of a concatenated file, but actually they differ as presented below:
As you can see, the differences can equals almost 0,5 MOS in some cases.
I wonder what can be the reason of this discrepancy.
The text was updated successfully, but these errors were encountered: