forked from php/php-src
-
Notifications
You must be signed in to change notification settings - Fork 0
/
unicode-todo.txt
42 lines (28 loc) · 1.77 KB
/
unicode-todo.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
* unicode in `..`?
* EBCDIC support?
* Discuss putting ZEND_ATTRIBUTE_FORMAT back on zend_error() or create a new
zend_error_ex() function that supports new specifiers
* Find all instances where unicode strings are compared with memcmp() and
replace either with u_memcmpCodePointOrder() or ucol_strcoll()
* Opening a collator may return U_USING_DEFAULT_WARNING, U_USING_FALLBACK_WARNING
* Need to finish making HTTP input work as described in the design doc. It
is almost there, but needs to handle conversion errors and provide a way to
explicitly re-decode raw data with specified encoding (input filter,
perhaps?). Also check for _charset_ request field which might be present.
* Optimize T_INLINE_HTML blocks conversion by either creating a converter
cache or remembering the last used converter in the executor globals.
* What to do with binary string literals and runtime casting? Literals are in
script_encoding, casting uses runtime_encoding. If they are different, bad
stuff happens. Maybe those who do that stuff should suffer anyway.
* Control of fallback mappings in conversions.
* Figure out generic approach to locale validation/fallback.
* Constant registration/fetching should do identifier normalization.
* Make zend_u_str_case_fold() do only case-folding and nothing else. The
normalization should be done by zend_normalize_identifier().
* Look at performance implications of identifier normalization. Measure
performance difference when doing quickCheck + normalize versus simple
normalize.
* USTR_MAKE("") should be estrndup(EMPTY_STR)
* See if ext/pcre can ba adjusted to allow operations on pure binary
strings. Ideal mode would be: convert all IS_UNICODE to UTF-8, assume that
binary strings with /u modifier are UTF-8, otherwise it's pure binary.