Skip to content

Commit

Permalink
彻底修正数词识别的问题:hankcs#150
Browse files Browse the repository at this point in the history
  • Loading branch information
hankcs committed Apr 22, 2016
1 parent 2486b24 commit a8affe4
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions src/main/java/com/hankcs/hanlp/seg/Segment.java
Original file line number Diff line number Diff line change
Expand Up @@ -310,15 +310,19 @@ protected void mergeNumberQuantifier(List<Vertex> termList, WordNet wordNetAll,
}
sbQuantifier.append(cur.realWord);
pre.attribute = new CoreDictionary.Attribute(Nature.mq);
pre.wordID = -1; // -1代表NGram模型中的“万能词”,保证二次维特比得分一定更高
pre.wordID = CoreDictionary.M_WORD_ID;
iterator.remove();
// 将其从wordNet中删除
for (Vertex vertex : wordNetAll.getVertexes()[line + sbQuantifier.length()])
{
if (vertex.from == cur) vertex.from = null;
}
}
if (sbQuantifier.length() != pre.realWord.length())
{
pre.realWord = sbQuantifier.toString();
pre.word = Predefine.TAG_NUMBER;
pre.wordID = CoreDictionary.M_WORD_ID;
cur.from = null; // 在修改了节点之后,将后向节点清空
sbQuantifier.setLength(0);
}
}
Expand Down

0 comments on commit a8affe4

Please sign in to comment.