Adds the SmartChineseAnalyzer (http://code.google.com/p/imdict-chinese-analyzer/) as an easy-to-install plugin.
1) From a clean install, install the plugin as follows:
./plugin -url https://github.com/downloads/thmttch/elasticsearch/elasticsearch-analysis-smartchinese-0.18.0-SNAPSHOT.zip -install analysis-smartchinese
2) Create a new index, and set the default analyzer:
curl -XPUT localhost:9200/test1 -d ’
{
“analysis”: {
“analyzer”: {
“default”: {
“type”: “SmartChinese”
}
}
}
}’
3) Generate an analysis of some text. Notice that the analyzer generates both unigrams and bigrams:
curl -XGET localhost:9200/test1/analyze -d ‘{ “body” : “我说世界好!” }’
{
“tokens”: [
{
“end_offset”: 7,
“position”: 3,
“start_offset”: 3,
“token”: “text”,
“type”: “word”
},
{
“end_offset”: 12,
“position”: 7,
“start_offset”: 11,
“token”: “我”,
“type”: “word”
},
{
“end_offset”: 13,
“position”: 8,
“start_offset”: 12,
“token”: “说”,
“type”: “word”
},
{
“end_offset”: 15,
“position”: 9,
“start_offset”: 13,
“token”: “世界”,
“type”: “word”
},
{
“end_offset”: 16,
“position”: 10,
"startoffset": 15,
“token”: “好”,
“type”: “word”
}
]
}