GlyphWiki_Char_Decomposer

Chinese character decomposer based on Glyphwiki dump

Structure of a GlyphWiki dump

NOte: I am not the author of glyphwiki. Here is just my understanding.

Name  |Related| Shape Information  
u5475 |u5475  |99:0:0:4:-1:187:195:u53e3-01$99:0:0:-27:0:202:200:u53ef-02

Here is an example, u5475 is the unicode of '呵' and the name of the corresponding page on Glyphwiki.
When you open the glyph wiki dump, you may see a lot of aji-xxxxx, c-beta-xxxxx. They are the code in another encoding. Related indicates the unicode (I believe). And don't worry, the unicode versions are in the list too. They are just put behind.

Then let us see the sequence of the numbers in the 3rd column.

99:0:0:4:-1:187:195:u53e3-01$99:0:0:-27:0:202:200:u53ef-02

This example has two glyph components:

99:0:0:4:-1:187:195:u53e3-01

and

99:0:0:-27:0:202:200:u53ef-02

They are segmented by '$'.
The unicode like sequence (e.g. 'u53e3-01') is the unicode of the radical. And the numbers before the code is to describe the shape of the font. If you can read Japanese and want to learn more, check the following:

上地宏一,「KAGEシステム: グリフ配信と生成文字品質の向上にむけて」,CHISE Symposium 2003, 東京, 2003年3月15日.

What does this project do?

These codes are used to decompose every Chinese characters into radicals in the file. i.e. Replace every Chinese character with the unicode of the radical with a segmentation symbol. e.g:

'呵' ----> $$u53e3-01$$u53ef-02

Usage

python main.py -i [input_file] -o [ouput_file]

GlyphWiki dump will be automatically downloaded, and parsed. A .pkl containing the mapping dictionary of the characters and radicals will be generated as well.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GlyphWiki_Char_Decomposer

Structure of a GlyphWiki dump

What does this project do?

Usage

About

Releases

Packages

Languages

License

yuanzhiKe/GlyphWiki_Char_Decomposer

Folders and files

Latest commit

History

Repository files navigation

GlyphWiki_Char_Decomposer

Structure of a GlyphWiki dump

What does this project do?

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages