Ticket #10064 (closed defect: fixed)

Opened 8 weeks ago

Last modified 8 weeks ago

OutputString() in xml module crashes if mb_str() fails

Reported by: jgeorgal Owned by:
Priority: normal Milestone:
Component: base Version: 2.8.8
Keywords: OutputString, xml, encoding, conversion, crash Cc:
Blocked By: Patch: no
Blocking:

Description

The Write command in the inline function "OutputString" at xml/xml.cpp:762 crashes if wxString::mb_str() function (at the previous line) fails.

Specifically, the affected code is the following:

    const wxWX2MBbuf buf(str.mb_str(*(convFile ? convFile : &wxConvUTF8)));
    stream.Write((const char*)buf, strlen((const char*)buf));

The problem is that if mb_str() fails to convert the string to/from the appropriate encoding, it returns NULL. This means that strlen() causes the application to crash as the result of NULL pointer dereference.

The solution to the above is obvious. However, I think, it would be better for encoding conversions to "never" fail. In my case wxMBConv failed to convert just a single character ("RIGHT SINGLE QUOTATION MARK" (0x2019)) from UNICODE (UTF-16) to iso-8859-7 ... and because of this single failure, the whole string conversion failed - returning NULL. I'd rather have gotten back a string with the non-convertible characters replaced with e.g. '?' than NULL.

So, I'd propose to have wxMBConv classes take a:

int (*failed_char_conversion_func_ptr)(int character, const wxString& sourceEncoding, const wxString& targetEncoding)

callback (or something similar) to be called whenever a character conversion fails, returning the character(s) that should be used instead of the nonconvertible one. This can also be a global callback - I guess - since character conversion fall-back logic is application specific.

What do you guys think?

Thanks,
Giannis

PS. I forgot to mention that I observed this behavior on a Windows XP SP3 machine. I don't know if the above is reproducible on GNU/Linux or Macs.

Change History

Changed 8 weeks ago by vadz

  • status changed from new to closed
  • resolution set to fixed

It definitely shouldn't crash, thanks for reporting this, fixed in r56215 for the trunk and (in a simpler and less satisfying way) in r56216 in 2.8 branch.

However the failed conversion fall back is a really ugly idea in the context of wx API and we have no plans to implement this. If you want the conversion to be lossless, just use UTF-8 (which is the default BTW so you simply have to stop changing it). If you need something else you can always define your own "lossful" converter, i.e. an object of class deriving from wxMBConv. Currently there is no way to associate it with wxXmlDocument (it only takes encoding names, not conversion objects) but this is something that could be added if anybody is sufficiently motivated.

Note: See TracTickets for help on using tickets.