String length is incorrect when converting from UTF-16 to UTF-32 wxString
|Reported by:||plorkyeran||Owned by:||VZ|
auto str = wxString::FromUTF8("\xf0\x9f\x98\x84"); printf("%s\n", str.utf8_str().data()); assert(str.size() == 1); wxMBConvUTF16 conv; auto str2 = wxString(conv.cMB2WX((const char *)u"\xd83d\xde04")); printf("%s\n", str2.utf8_str().data()); assert(str2.size() == 1);
This correctly prints the same character both times, but the second assert fails due to str2.size() being 2, and with longer strings anything that operates based on the string length ends up reading uninitialized memory.
The cause of this is that wxMBConvUTF16straight::ToWChar skips calculating the required buffer size by taking advantage of that UTF-32 is never more code units than UTF-16 and allocating a larger buffer than needed when surrogate pairs are involved, but wxMBConv::cMB2WC assumes the buffer size is the actual string size and discards the length returned by the actual conversion. The attached patch fixes this by making wxMBConv::cMB2WC shrink the buffer to the length actually used by the conversion.
Change History (6)
Changed 6 months ago by plorkyeran
comment:3 Changed 6 months ago by VZ
- Owner set to VZ
- Resolution set to fixed
- Status changed from new to closed