-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 6f7ee67
Showing
1,093 changed files
with
3,089 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
AN4 License Terms | ||
|
||
This audio database is free for use for any purpose (commercial or otherwise) subject to the restrictions detailed below. | ||
|
||
/* ==================================================================== | ||
* Copyright (c) 1991-2005 Carnegie Mellon University. All rights | ||
* reserved. | ||
* | ||
* Redistribution and use in source and binary forms, with or without | ||
* modification, are permitted provided that the following conditions | ||
* are met: | ||
* | ||
* 1. Redistributions of source code must retain the above copyright | ||
* notice, this list of conditions and the following disclaimer. | ||
* | ||
* 2. Redistributions in binary form must reproduce the above copyright | ||
* notice, this list of conditions and the following disclaimer in | ||
* the documentation and/or other materials provided with the | ||
* distribution. | ||
* | ||
* This work was supported in part by funding from the Defense Advanced | ||
* Research Projects Agency and the National Science Foundation of the | ||
* United States of America, and the CMU Sphinx Speech Consortium. | ||
* | ||
* THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND | ||
* ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, | ||
* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR | ||
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY | ||
* NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | ||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | ||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | ||
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | ||
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
* | ||
* ==================================================================== | ||
*/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
## AN4 example database for SphinxTrain | ||
|
||
This directory contains the Census (AN4) database audio files. Some | ||
files from the original database were excluded, namely those | ||
with filenames starting with "cen9". | ||
|
||
The AN4 database was recorded at Carnegie Mellon University circa | ||
1991. For more detailes, please see "Acoustical and environmental | ||
robustness in automatic speech recognition", by Alex Acero, published | ||
by Kluwer Academic Publishers, 1993. | ||
|
||
The files have been converted to RIFF (a.k.a. Microsoft WAV) format | ||
for ease of use. | ||
|
||
The directories contain: | ||
|
||
-wav/an4_clstk: training data set recorded on close talking microphone. | ||
|
||
-wav/an4test_clstk: test data set recorded on close talking microphone. | ||
|
||
-etc: directory containing the transcriptions, control files, | ||
dictionaries, and a basic unigram language model for evaluation. | ||
|
||
This database is mostly interesting for quickly testing the training | ||
scripts, as it is quite small and simplistic. | ||
|
||
See [LICENSE](./LICENSE) for terms of use. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
A AH | ||
A(2) EY | ||
AND AE N D | ||
AND(2) AH N D | ||
APOSTROPHE AH P AA S T R AH F IY | ||
APRIL EY P R AH L | ||
AREA EH R IY AH | ||
AUGUST AA G AH S T | ||
AUGUST(2) AO G AH S T | ||
B B IY | ||
C S IY | ||
CODE K OW D | ||
D D IY | ||
DECEMBER D IH S EH M B ER | ||
E IY | ||
EIGHT EY T | ||
EIGHTEEN EY T IY N | ||
EIGHTEENTH EY T IY N TH | ||
EIGHTH EY T TH | ||
EIGHTH(2) EY TH | ||
EIGHTY EY T IY | ||
ELEVEN IH L EH V AH N | ||
ELEVEN(2) IY L EH V AH N | ||
ELEVENTH IH L EH V AH N TH | ||
ELEVENTH(2) IY L EH V AH N TH | ||
ENTER EH N ER | ||
ENTER(2) EH N T ER | ||
ERASE IH R EY S | ||
ERASE(2) IY R EY S | ||
F EH F | ||
FEBRUARY F EH B AH W EH R IY | ||
FEBRUARY(2) F EH B R UW W EH R IY | ||
FEBRUARY(3) F EH B UW W EH R IY | ||
FEBRUARY(4) F EH B Y AH W EH R IY | ||
FEBRUARY(5) F EH B Y UW W EH R IY | ||
FIFTEEN F IH F T IY N | ||
FIFTEENTH F IH F T IY N TH | ||
FIFTH F IH F TH | ||
FIFTH(2) F IH TH | ||
FIFTY F IH F T IY | ||
FIRST F ER S T | ||
FIVE F AY V | ||
FORTY F AO R T IY | ||
FOUR F AO R | ||
FOURTEEN F AO R T IY N | ||
FOURTH F AO R TH | ||
G JH IY | ||
GO G OW | ||
H EY CH | ||
HALF HH AE F | ||
HELP HH EH L P | ||
HUNDRED HH AH N D ER D | ||
HUNDRED(2) HH AH N D R AH D | ||
HUNDRED(3) HH AH N D R IH D | ||
HUNDRED(4) HH AH N ER D | ||
I AY | ||
J JH EY | ||
JANUARY JH AE N Y UW EH R IY | ||
JULY JH AH L AY | ||
JULY(2) JH UW L AY | ||
JUNE JH UW N | ||
K K EY | ||
L EH L | ||
M EH M | ||
MARCH M AA R CH | ||
MAY M EY | ||
N EH N | ||
NINE N AY N | ||
NINETEEN N AY N T IY N | ||
NINETY N AY N T IY | ||
NINTH N AY N TH | ||
NO N OW | ||
NOVEMBER N OW V EH M B ER | ||
O OW | ||
OCTOBER AA K T OW B ER | ||
OF AH V | ||
OH OW | ||
ONE HH W AH N | ||
ONE(2) W AH N | ||
P P IY | ||
Q K Y UW | ||
R AA R | ||
REPEAT R IH P IY T | ||
REPEAT(2) R IY P IY T | ||
RUBOUT R AH B AW T | ||
S EH S | ||
SECOND S EH K AH N | ||
SECOND(2) S EH K AH N D | ||
SEPTEMBER S EH P T EH M B ER | ||
SEVEN S EH V AH N | ||
SEVENTEEN S EH V AH N T IY N | ||
SEVENTH S EH V AH N TH | ||
SEVENTY S EH V AH N IY | ||
SEVENTY(2) S EH V AH N T IY | ||
SIX S IH K S | ||
SIXTEEN S IH K S T IY N | ||
SIXTEENTH S IH K S T IY N TH | ||
SIXTH S IH K S TH | ||
SIXTY S IH K S T IY | ||
START S T AA R T | ||
STOP S T AA P | ||
T T IY | ||
TEN T EH N | ||
THIRD TH ER D | ||
THIRTEEN TH ER T IY N | ||
THIRTIETH TH ER T IY AH TH | ||
THIRTIETH(2) TH ER T IY IH TH | ||
THIRTY TH ER D IY | ||
THIRTY(2) TH ER T IY | ||
THOUSAND TH AW Z AH N | ||
THOUSAND(2) TH AW Z AH N D | ||
THREE TH R IY | ||
TWELFTH T W EH L F TH | ||
TWELVE T W EH L V | ||
TWENTIETH T W EH N IY AH TH | ||
TWENTIETH(2) T W EH N IY IH TH | ||
TWENTIETH(3) T W EH N T IY AH TH | ||
TWENTIETH(4) T W EH N T IY IH TH | ||
TWENTY T W EH N IY | ||
TWENTY(2) T W EH N T IY | ||
TWO T UW | ||
U Y UW | ||
V V IY | ||
W D AH B AH L Y UW | ||
X EH K S | ||
Y W AY | ||
YES Y EH S | ||
Z Z IY | ||
ZERO Z IH R OW | ||
ZERO(2) Z IY R OW |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
<s> SIL | ||
</s> SIL | ||
<sil> SIL |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
############################################################################# | ||
This is a 2-gram language model, based on a vocabulary of 13 words, | ||
which begins "<s>", "</s>", "oh"... | ||
This is an OPEN-vocabulary model (type 1) | ||
(OOVs were mapped to UNK, which is treated as any other vocabulary word) | ||
This file is in the ARPA-standard format introduced by Doug Paul. | ||
|
||
p(wd3|wd1,wd2)= if(trigram exists) p_3(wd1,wd2,wd3) | ||
else if(bigram w1,w2 exists) bo_wt_2(w1,w2)*p(wd3|wd2) | ||
else p(wd3|w2) | ||
|
||
p(wd2|wd1)= if(bigram exists) p_2(wd1,wd2) | ||
else bo_wt_1(wd1)*p_1(wd2) | ||
|
||
All probs and back-off weights (bo_wt) are given in log10 form. | ||
|
||
Data formats: | ||
|
||
Beginning of data mark: \data\ | ||
ngram 1=nr # number of 1-grams | ||
ngram 2=nr # number of 2-grams | ||
|
||
\1-grams: | ||
p_1 wd_1 bo_wt_1 | ||
\2-grams: | ||
p_2 wd_1 wd_2 | ||
|
||
end of data mark: \end\ | ||
|
||
\data\ | ||
ngram 1=107 | ||
ngram 2=1 | ||
|
||
\1-grams: | ||
-2.0253 <UNK> 0.0000 | ||
-2.0253 </s> -99.0000 | ||
-99.0000 <s> 0.0000 | ||
-2.0253 A 0.0000 | ||
-2.0253 AND 0.0000 | ||
-2.0253 APOSTROPHE 0.0000 | ||
-2.0253 APRIL 0.0000 | ||
-2.0253 AREA 0.0000 | ||
-2.0253 AUGUST 0.0000 | ||
-2.0253 B 0.0000 | ||
-2.0253 C 0.0000 | ||
-2.0253 CODE 0.0000 | ||
-2.0253 D 0.0000 | ||
-2.0253 DECEMBER 0.0000 | ||
-2.0253 E 0.0000 | ||
-2.0253 EIGHT 0.0000 | ||
-2.0253 EIGHTEEN 0.0000 | ||
-2.0253 EIGHTEENTH 0.0000 | ||
-2.0253 EIGHTH 0.0000 | ||
-2.0253 EIGHTY 0.0000 | ||
-2.0253 ELEVEN 0.0000 | ||
-2.0253 ELEVENTH 0.0000 | ||
-2.0253 ENTER 0.0000 | ||
-2.0253 ERASE 0.0000 | ||
-2.0253 F 0.0000 | ||
-2.0253 FEBRUARY 0.0000 | ||
-2.0253 FIFTEEN 0.0000 | ||
-2.0253 FIFTEENTH 0.0000 | ||
-2.0253 FIFTH 0.0000 | ||
-2.0253 FIFTY 0.0000 | ||
-2.0253 FIRST 0.0000 | ||
-2.0253 FIVE 0.0000 | ||
-2.0253 FORTY 0.0000 | ||
-2.0253 FOUR 0.0000 | ||
-2.0253 FOURTEEN 0.0000 | ||
-2.0253 FOURTH 0.0000 | ||
-2.0253 G 0.0000 | ||
-2.0253 GO 0.0000 | ||
-2.0253 H 0.0000 | ||
-2.0253 HALF 0.0000 | ||
-2.0253 HALL 0.0000 | ||
-2.0253 HELP 0.0000 | ||
-2.0253 HUNDRED 0.0000 | ||
-2.0253 I 0.0000 | ||
-2.0253 J 0.0000 | ||
-2.0253 JANUARY 0.0000 | ||
-2.0253 JULY 0.0000 | ||
-2.0253 JUNE 0.0000 | ||
-2.0253 K 0.0000 | ||
-2.0253 L 0.0000 | ||
-2.0253 LANE 0.0000 | ||
-2.0253 M 0.0000 | ||
-2.0253 MARCH 0.0000 | ||
-2.0253 MAY 0.0000 | ||
-2.0253 MEMORY 0.0000 | ||
-2.0253 N 0.0000 | ||
-2.0253 NINE 0.0000 | ||
-2.0253 NINETEEN 0.0000 | ||
-2.0253 NINETY 0.0000 | ||
-2.0253 NINTH 0.0000 | ||
-2.0253 NO 0.0000 | ||
-2.0253 O 0.0000 | ||
-2.0253 OCTOBER 0.0000 | ||
-2.0253 OF 0.0000 | ||
-2.0253 OH 0.0000 | ||
-2.0253 ONE 0.0000 | ||
-2.0253 P 0.0000 | ||
-2.0253 Q 0.0000 | ||
-2.0253 R 0.0000 | ||
-2.0253 REPEAT 0.0000 | ||
-2.0253 RUBOUT 0.0000 | ||
-2.0253 S 0.0000 | ||
-2.0253 SECOND 0.0000 | ||
-2.0253 SEPTEMBER 0.0000 | ||
-2.0253 SEVEN 0.0000 | ||
-2.0253 SEVENTEEN 0.0000 | ||
-2.0253 SEVENTH 0.0000 | ||
-2.0253 SEVENTY 0.0000 | ||
-2.0253 SIX 0.0000 | ||
-2.0253 SIXTEEN 0.0000 | ||
-2.0253 SIXTEENTH 0.0000 | ||
-2.0253 SIXTH 0.0000 | ||
-2.0253 SIXTY 0.0000 | ||
-2.0253 START 0.0000 | ||
-2.0253 STOP 0.0000 | ||
-2.0253 T 0.0000 | ||
-2.0253 TEN 0.0000 | ||
-2.0253 THIRD 0.0000 | ||
-2.0253 THIRTIETH 0.0000 | ||
-2.0253 THIRTY 0.0000 | ||
-2.0253 THOUSAND 0.0000 | ||
-2.0253 THREE 0.0000 | ||
-2.0253 TWELFTH 0.0000 | ||
-2.0253 TWELVE 0.0000 | ||
-2.0253 TWELVTH 0.0000 | ||
-2.0253 TWENTIETH 0.0000 | ||
-2.0253 TWENTY 0.0000 | ||
-2.0253 TWO 0.0000 | ||
-2.0253 U 0.0000 | ||
-2.0253 V 0.0000 | ||
-2.0253 W 0.0000 | ||
-2.0253 WEAN 0.0000 | ||
-2.0253 X 0.0000 | ||
-2.0253 Y 0.0000 | ||
-2.0253 YES 0.0000 | ||
-2.0253 Z 0.0000 | ||
-2.0253 ZERO 0.0000 | ||
|
||
\2-grams: | ||
0.0000 <s> </s> | ||
\end\ |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
AA | ||
AE | ||
AH | ||
AO | ||
AW | ||
AY | ||
B | ||
CH | ||
D | ||
EH | ||
ER | ||
EY | ||
F | ||
G | ||
HH | ||
IH | ||
IY | ||
JH | ||
K | ||
L | ||
M | ||
N | ||
OW | ||
P | ||
R | ||
S | ||
SIL | ||
T | ||
TH | ||
UW | ||
V | ||
W | ||
Y | ||
Z |
Oops, something went wrong.