Skip to content

Commit

Permalink
[egs] fixed bug in egs/gale_arabic/s5c/local/prepare_dict_subword.sh …
Browse files Browse the repository at this point in the history
…that it may delete words matching '<*>' (kaldi-asr#3465)
  • Loading branch information
DongjiGao authored and danpovey committed Jul 17, 2019
1 parent b5385b4 commit 98aa1d8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion egs/gale_arabic/s5c/local/prepare_dict_subword.sh
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ glossaries="<UNK> <sil>"
if [ $stage -le 0 ]; then
echo "$0: making subword lexicon... $(date)."
# get pair_code file
cut -d ' ' -f2- data/train/text | sed 's/<[^>]*>//g' | utils/lang/bpe/learn_bpe.py -s $num_merges > data/local/pair_code.txt
cut -d ' ' -f2- data/train/text | sed 's/<sil>//g;s/<UNK>//g' | utils/lang/bpe/learn_bpe.py -s $num_merges > data/local/pair_code.txt
mv $dir/lexicon.txt $dir/lexicon_word.txt
# get words
cut -d ' ' -f1 $dir/lexicon_word.txt > $dir/words.txt
Expand Down

0 comments on commit 98aa1d8

Please sign in to comment.