- July 18, 2020: Change the operation of
--ngram_num_threshold
and--av_threshold
to filter out n-grams. Past: n-grams whose frequency or AV score is lower than or equal to the threshold will be excluded from the lexicon N; now: n-grams whose frequency or AV score is lower than the threshold will be excluded from the lexicon N. You can check readme for detailed instruction to use the two hyper-parameters. The hyper-parameter settings are also changed in run.sh to reflect this update. This update only affects the training process; it does not affect the testing and predicting process. - July 11, 2020: Implement the
predict
function inwmseg_main.py
. You can use that function to segment the sentences in an input file with a pre-trained WMSeg model. See run_sample.sh for the usage, and ./sample_data/sentences.txt for the input format. - July 7, 2020: The release of pre-trained WMSeg models.