Skip to content

Commit 65bdf46

Browse files
authored
Use the correct length in text for parsing
UnicodeCharsXXTextRecord is encoded as UTF-16 and the length refers to the length in encoded bytes not Unicode. This meant that records of length > 128 Unicode characters could fail to convert to binary elsewhere because the length couldn't fit in the record (but it varied on the length). (Changing this code to use the alternate CharsXXTextRecord with UTF-8 might also be useful for saving space)
1 parent 2e4122f commit 65bdf46

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

wcf/xml2records.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ def _parse_data(self, data, is_cdata=False):
166166
return DateTimeTextRecord(dt, tz)
167167

168168
# text as fallback
169-
val = len(data)
169+
val = len(data.encode('utf-16le'))
170170
if val < 2**8:
171171
return UnicodeChars8TextRecord(data)
172172
elif val < 2**16:

0 commit comments

Comments
 (0)