Takes Chinese characters and converts them to pinyin, zhuyin, and Cyrillic.
Based on hotoo/pinyin
- Documentation: http://pypinyin.rtfd.io/
- GitHub: https://github.com/mozillazg/python-pinyin
- License: MIT license
- PyPI: https://pypi.org/project/pypinyin
- Python version: 2.7, pypy, pypy3, 3.4, 3.5, 3.6, 3.7, 3.8
Contents
- Finds the most fitting pinyin based on phrase occurences.
- Has support for characters with two or more readings (heteronyms).
- Has support for simplified, traditional characters, and zhuyin (also known als bopomofo).
- Has support for multiple styles of pinyin and zhuyin (e.g. tone conventions).
$ pip install pypinyin
Python 3 (For below Python 2, change '中心' to u'中心'):
>>> from pypinyin import pinyin, lazy_pinyin, Style
>>> pinyin('中心')
[['zhōng'], ['xīn']]
>>> pinyin('中心', heteronym=True) # make use of heteronym mode
[['zhōng', 'zhòng'], ['xīn']]
>>> pinyin('中心', style=Style.FIRST_LETTER) # set the pinyin style
[['z'], ['x']]
>>> pinyin('中心', style=Style.TONE2, heteronym=True)
[['zho1ng', 'zho4ng'], ['xi1n']]
>>> pinyin('中心', style=Style.TONE3, heteronym=True)
[['zhong1', 'zhong4'], ['xin1']]
>>> pinyin('中心', style=Style.BOPOMOFO) # zhuyin mode
[['ㄓㄨㄥ'], ['ㄒㄧㄣ']]
>>> lazy_pinyin('中心') # don't include tone information or heteronyms
['zhong', 'xin']
Please take note :
- Pinyin results will have no indicators for syllables with a neutral tone,
neither diacritics or numbers. (For the use of '5' for neutral tones, see article). * Lazy pinyin results will use 'v' for 'ü' (for using 'ü', see article).
Command line tools:
$ pypinyin 音乐
yīn yuè
$ pypinyin -h
For more details, see article
For project development related question, please refer to development documents.
A database of pinyin phrases are used to solve the heteronym problem. If there turns out to be a mistake, you can use custom pinyin phrases to adapt the database:
>>> from pypinyin import Style, pinyin, load_phrases_dict
>>> pinyin('步履蹒跚')
[['bù'], ['lǚ'], ['mán'], ['shān']]
>>> load_phrases_dict({'步履蹒跚': [['bù'], ['lǚ'], ['pán'], ['shān']]})
>>> pinyin('步履蹒跚')
[['bù'], ['lǚ'], ['pán'], ['shān']]
For more details, see article.
>>> from pypinyin import Style, pinyin
>>> pinyin('下雨天', style=Style.INITIALS)
[['x'], [''], ['t']]
Because according to the standard pinyin rules (《汉语拼音方案》), 'y', 'w', and 'ü' ('yu') are not counted as syllable initials.
** If this causes you inconvenience, please also be aware of characters without an initial like '啊' ('a'), '饿' ('e'), '按' ('an'), '昂' ('ang'), etc. In this case you might need 'FIRST_LETTER' mode.
—— @hotooreference: hotoo/pinyin#57, #22, #27, #44
If this is not the desired behaviour, that is if you want 'y' to be counted as an initial, use 'strict=False'.
>>> from pypinyin import Style, pinyin
>>> pinyin('下雨天', style=Style.INITIALS)
[['x'], [''], ['t']]
>>> pinyin('下雨天', style=Style.INITIALS, strict=False)
[['x'], ['y'], ['t']]
If you don't care too much about the correctness of pinyin, you can use the environmental parameters 'PYPINYIN_NO_PHRASES' and 'PYPINYIN_NO_DICT_COPY' to reduce internal memory load. For more details, see article
For more FAQ: FAQ
- Single charachter pinyin usage pinyin-data data
- Pinyin usage in phrases phrase-pinyin-data data
- hotoo/pinyin: A tool for converting Chinese characters to pinyin, Node.js/JavaScript version.
- mozillazg/go-pinyin: A tool for converting Chinese characters to pinyin, Go version.
- mozillazg/rust-pinyin: A tool for converting Chinese characters to pinyin, Rust version.