-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postprocessing skips words in final result #74
Comments
Hello, ldeseynes, use-nnet2: True use-vad: False Just a sample post-processor that appends "." to the hypothesispost-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'
I am very grateful what's wrong on my configuration. |
Hi, |
Hello, |
I just found following command: It decodes audio files like this:LOG (online2-wav-nnet2-latgen-faster[5.5.463 I am wonderful if you carefully check it. |
Hi, Here are the parameters I set but I have not used the system for a while. In your yaml file, you should add nnet-mode: 3. Also, check that you're decoding your audio file with the correct sample rate and number of channels. use-threaded-decoder=true |
Thanks for your kindly reply. |
What's your command to start the scripts ? |
I am using gstreamer server and client by using https://github.com/alumae/kaldi-gstreamer-server like this:
french_stt.yaml as follows: use-vad: False post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;' logging:
I've confirmed test.wav is 16KHz, 16bit, mono. Let me know your idea for it. |
This looks fine to me. Just check the parameters you used for your training (acoustic scale and frame subsampling factor) because I'm not sure about their value in the nnet2 setup. Anyway you'd rather use a later model if you want to get decent results. |
Thanks for your reply. |
Just use a chain model, you'll get better results and far more details about the recipe |
I've built French STT model by using wsj/s5/local/online/run_nnet2.sh. |
One thing, |
Sure, you can retrain a model using tedlium/s5_r3/run.sh. You don't need the rnnlm stuff after stage 18 for your Gstreamer application |
Thank you, will try. |
Hi Tanel !
First of all thanks for your great job.
I'm using gst-kaldi-nnet2-online to decode french speech. When running the client.py script, I get a quite correct transcription using my model but at some point the decoding stops and starts again a few seconds later. This results in missing words in the output.
Here is an example of the result with two dots corresponding to the missing words at the end of each sentence:
bonjour , je m' appelle Jean-Christophe je suis agriculteur dans le Loiret sur une exploitation céréalières . je me suis installé il y a une dizaine d' années .. <unk> trente-cinq ans aujourd' hui . je suis papa de trois enfants .. j' ai repris l' exploitation qui était consacré à la culture de la betterave sucrière de céréales depuis que je suis installé diversifiés .. j' y cultive aujourd' hui des oléagineux , comme le colza du maïs .. plusieurs types de céréales ..
Everytime this occurs, I get a warning from worker.py:
WARNING ([5.5.76~1-535b]:LatticeWordAligner():word-align-lattice.cc:263) [Lattice has input epsilons and/or is not input-deterministic (in Mohri sense)]-- i.e. lattice is not deterministic. Word-alignment may be slow and-or blow up in memory.
Any idea about that issue ?
Maybe there is a way to control the length of the final result. For example I get the following output from worker.py:
2018-10-16 16:59:55 - INFO: decoder2: 2faa241b-89ed-497a-9c2a-3b974bf1f8da: Got final result: bonjour , je m' appelle Jean-Christophe je suis agriculteur dans le Loiret sur une exploitation céréalières . je me suis installé il y a une dizaine d' années .
How can I change the code to split the results in two shorter ones ?
Thanks in advance!
The text was updated successfully, but these errors were encountered: