uchardet is a C language binding of the original C++ implementation of the universal charset detection library by Mozilla.
uchardet is an encoding detector library, which takes a sequence of bytes in an unknown character encoding without any additional information, and attempts to determine the encoding of the text.
The original code of universalchardet is available at http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/
Techniques used by universalchardet are described at http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
- Unicode
- UTF-8
- UTF-16BE / UTF-16LE
- UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
- Chinese
- ISO-2022-CN
- BIG5
- EUC-TW
- GB18030
- HZ-GB-23121
- Japanese
- ISO-2022-JP
- SHIFT_JIS
- EUC-JP
- Korean
- ISO-2022-KR
- EUC-KR
- Cyrillic
- ISO-8859-5
- KOI8-R
- WINDOWS-1251
- MACCYRILLIC
- IBM866
- IBM855
- Greek
- ISO-8859-7
- WINDOWS-1253
- Hebrew
- ISO-8859-8
- WINDOWS-1255
- Others
- WINDOWS-1252
apt-get install uchardet libuchardet-dev
brew install uchardet
cmake .
make
make install
uchardet Command Line Tool
Version 0.0.1
Author: BYVoid
Bug Report: http://code.google.com/p/uchardet/issues/entry
Usage:
uchardet [Options] [File]
Options:
-v, --version Print version and build information.
-h, --help Print this help.
See uchardet.h
- python-chardet Python port
- ruby-rchardet Ruby port
- juniversalchardet Java port of universalchardet
- jchardet Java port of chardet
- nuniversalchardet C# port of universalchardet
- nchardet C# port of chardet
- uchardet-enhanced A fork of mozilla universalchardet
- rust-uchardet Rust language binding of uchardet