forked from microsoft/CNTK
-
Notifications
You must be signed in to change notification settings - Fork 0
/
KaldiReaderReadme
164 lines (114 loc) · 4.7 KB
/
KaldiReaderReadme
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
== Authors of the Linux Building README ==
Kaisheng Yao
Microsoft Research
email: [email protected]
Wengong Jin,
Shanghai Jiao Tong University
email: [email protected]
Yu Zhang, Leo Liu, Scott Cyphers
CSAIL, Massachusetts Institute of Technology
email: [email protected]
email: [email protected]
email: [email protected]
Guoguo Chen
CLSP, Johns Hopkins University
email: [email protected]
== Preeliminaries ==
To build the cpu version, you have to install intel MKL blas library
or ACML library first. Note that ACML is free, whereas MKL may not be.
for MKL:
1. Download from https://software.intel.com/en-us/intel-mkl
for ACML:
1. Download from
http://developer.amd.com/tools-and-sdks/archive/amd-core-math-library-acml/acml-downloads-resources/
We have seen some problems with some versions of the library on Intel
processors, but have had success with acml-5-3-1-ifort-64bit.tgz
for Kaldi:
1. In kaldi-trunk/tools/Makefile, uncomment # OPENFST_VERSION = 1.4.1, and
re-install OpenFst using the makefile.
2. In kaldi-trunk/src/, do ./configure --shared; make depend -j 8; make -j 8;
and re-compile Kaldi (the -j option is for parallelization).
To build the gpu version, you have to install NIVIDIA CUDA first
== Build Preparation ==
You can do an out of source build in any directory, as well as an in
source build. Let $CNTK be the CNTK directory. For an out of source
build in the directory "build" type
>mkdir build
>cd build
>$CNTK/configure -h
(For an in source build, just run configure in the $CNTK directory).
You will see various options for configure, as well as their default
values. CNTK needs a CPU math directory, either acml or mkl. If you
do not specify one and both are available, acml will be used. For GPU
use, a cuda and gdk directory are also required. Similary, to build
the kaldi plugin a kaldi directory is required. You may also specify
whether you want a debug or release build, as well as add additional
path roots to use in searching for libraries.
Rerun configure with the desired options:
>$CNTK/configure ...
This will create a Config.make and a Makefile (if you are doing an in
source build, a Makefile will not be created). The Config.make file
records the configuration parameters and the Makefile reinvokes the
$CNTK/Makefile, passing it the build directory where it can find the
Config.make.
After make completes, you will have the following directories:
.build will contain object files, and can be deleted
bin contains the cntk program
lib contains libraries and plugins
The bin and lib directories can safely be moved as long as they
remain siblings.
To clean
>make clean
== Run ==
All executables are in bin directory:
cntk: The main executable for CNTK
*.so: shared library for corresponding reader, these readers will be linked and loaded dynamically at runtime.
./cntk configFile=${your cntk config file}
== Kaldi Reader ==
This is a HTKMLF reader and kaldi writer (for decode)
To build, set --with-kaldi when you configure.
The feature section is like:
writer=[
writerType=KaldiReader
readMethod=blockRandomize
frameMode=false
miniBatchMode=Partial
randomize=Auto
verbosity=1
ScaledLogLikelihood=[
dim=$labelDim$
Kaldicmd="ark:-" # will pipe to the Kaldi decoder latgen-faster-mapped
scpFile=$outputSCP$ # the file key of the features
]
]
== Kaldi2 Reader ==
This is a kaldi reader and kaldi writer (for decode)
To build, set --with-kaldi in your Config.make
The features section is different:
features=[
dim=
rx=
scpFile=
featureTransform=
]
rx is a text file which contains:
one Kaldi feature rxspecifier readable by RandomAccessBaseFloatMatrixReader.
'ark:' specifiers don't work; only 'scp:' specifiers work.
scpFile is a text file generated by running:
feat-to-len FEATURE_RXSPECIFIER_FROM_ABOVE ark,t:- > TEXT_FILE_NAME
scpFile should contain one line per utterance.
If you want to run with fewer utterances, just shorten this file.
(It will load the feature rxspecifier but ignore utterances not present in scpFile).
featureTransform is the name of a Kaldi feature transform file:
Kaldi feature transform files are used for stacking / applying transforms to features.
An empty string (if permitted by the config file reader?) or the special string: NO_FEATURE_TRANSFORM
says to ignore this option.
********** Labels **********
The labels section is also different.
labels=[
mlfFile=
labelDim=
labelMappingFile=
]
Only difference is mlfFile. mlfFile is a different format now. It is a text file which contains:
one Kaldi label rxspecifier readable by Kaldi's copy-post binary.