Skip to content

Commit

Permalink
Add text encoding detection to the Locale Kit.
Browse files Browse the repository at this point in the history
As usual we ask ICU to do the actual work. The TextEncoding constructor
is fed with a sample of the text to identify (ICU docs recommend a few
hundred bytes). The text is analyzed in various ways (bytes patterns
such as UTF-8 escaping schemes, common letter sequences from known
languages, byte order marks) and an encoding is determined.

Replace code in StyledEdit by this new implementation.

Note that ICU seems to always return some valid encoding, even with fed
with obviously non-text data. This makes StyledEdit open the files no
matter what, where it would error out before.

Fixes #9395.
  • Loading branch information
pulkomandy committed Sep 25, 2016
1 parent aec3e63 commit fbb725b
Show file tree
Hide file tree
Showing 4 changed files with 67 additions and 670 deletions.
28 changes: 28 additions & 0 deletions headers/os/locale/TextEncoding.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/*
* Copyright 2016, Haiku, inc.
* Distributed under terms of the MIT license.
*/


#ifndef TEXTENCODING_H
#define TEXTENCODING_H


#include <String.h>

#include <stddef.h>


class TextEncoding
{
public:
TextEncoding(const char* data, size_t length);

BString GetName();

private:
BString fName;
};


#endif /* !TEXTENCODING_H */
Loading

0 comments on commit fbb725b

Please sign in to comment.