Opened 2 years ago

Last modified 11 hours ago

#15621 new defect

Loading/displaying an invalid multi-byte UTF8 character crashes application (wxSTC/OSX)

Reported by: paulclinger Owned by:
Priority: normal Milestone:
Component: wxOSX Version:
Keywords: wxSTC wxOSX Cc:
Blocked By: Blocking:
Patch: no


I'm using wxSTC and when I open the attached file (or any other file with character in the attached file), the app crashes with the following stacktrace; this is the stacktrace from the sample This only happens in wxSTC on OSX (I tests in Windows and it works; I'm quite certain Linux works as well).

This seems to be caused by different multi-byte characters. The one that is in the file is: \240\157\128\128.

In some cases the crash is immediate (before file even displayed). In some cases parts are displayed (when this code is in a larger file), but crashes when you try to scroll to the position with that character.

Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000000

VM Regions Near 0:
--> PAGEZERO 0000000000000000-0000000000001000 [ 4K] ---/--- SM=NUL /Users/USER/*/

TEXT 0000000000001000-00000000004ff000 [ 5112K] r-x/rwx SM=COW /Users/USER/*/]

Application Specific Information:
objc[80558]: garbage collection is OFF

Thread 0 Crashed:: Dispatch queue:
0 0x977c8c42 CFStringCreateCopy + 18
1 0x97854538 CFAttributedStringCreate + 104
2 org.wxwindows.. 0x000793ab wxMacCoreGraphicsContext::GetPartialTextExtents(wxString const&, wxArrayDouble&) const + 323
3 org.wxwindows.. 0x00111a2e wxGCDCImpl::DoGetPartialTextExtents(wxString const&, wxArrayInt&) const + 222
4 org.wxwindows.. 0x00028fad SurfaceImpl::MeasureWidths(Font&, char const*, int, float*) + 99
5 org.wxwindows.. 0x0031155f PositionCache::MeasureWidths(Surface*, ViewStyle&, unsigned int, char const*, unsigned int, float*, Document*) + 507
6 org.wxwindows.. 0x002f2189 Editor::LayoutLine(int, Surface*, ViewStyle&, LineLayout*, int) + 2643
7 org.wxwindows.. 0x002fb368 Editor::Paint(Surface*, PRectangle) + 1764
8 org.wxwindows.. 0x0002db1c ScintillaWX::DoPaint(wxDC*, wxRect) + 314
9 org.wxwindows.. 0x0002089a wxStyledTextCtrl::OnPaint(wxPaintEvent&) + 108
10 org.wxwindows.. 0x001ba81a wxAppConsoleBase::HandleEvent(wxEvtHandler*, void (wxEvtHandler::*)(wxEvent&), wxEvent&) const + 42
11 org.wxwindows.. 0x001ba85d wxAppConsoleBase::CallEventHandler(wxEvtHandler*, wxEventFunctor&, wxEvent&) const + 59
12 org.wxwindows.. 0x00262750 wxEvtHandler::ProcessEventIfMatchesId(wxEventTableEntryBase const&, wxEvtHandler*, wxEvent&) + 106
13 org.wxwindows.. 0x0026425d wxEventHashTable::HandleEvent(wxEvent&, wxEvtHandler*) + 125
14 org.wxwindows.. 0x002642ba wxEvtHandler::TryHereOnly(wxEvent&) + 84
15 org.wxwindows.. 0x0026509a wxEvtHandler::TryBeforeAndHere(wxEvent&) + 42
16 org.wxwindows.. 0x00264302 wxEvtHandler::ProcessEventLocally(wxEvent&) + 26
17 org.wxwindows.. 0x00263ae9 wxEvtHandler::ProcessEvent(wxEvent&) + 203
18 org.wxwindows.. 0x00262a6c wxEvtHandler::SafelyProcessEvent(wxEvent&) + 34
19 org.wxwindows.. 0x001a17a3 wxWindowBase::HandleWindowEvent(wxEvent&) const + 27
20 org.wxwindows.. 0x00044d4f wxWindow::MacDoRedraw(long) + 807
21 org.wxwindows.. 0x000eb020 wxWidgetCocoaImpl::drawRect(void*, NSView*, void*) + 1384
22 org.wxwindows.. 0x000ec7a1 wxOSX_drawRect(NSView*, objc_selector*, _NSRect) + 673
23 0x900618b5 -[NSView _drawRect:clip:] + 3717
24 0x90090ce2 -[NSView _recursiveDisplayAllDirtyWithLockFocus:visRect:] + 1958
25 0x90091083 -[NSView _recursiveDisplayAllDirtyWithLockFocus:visRect:] + 2887
26 0x90091083 -[NSView _recursiveDisplayAllDirtyWithLockFocus:visRect:] + 2887
27 0x9005f212 -[NSView _recursiveDisplayRectIfNeededIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:topView:] + 708
28 0x9005e817 -[NSThemeFrame _recursiveDisplayRectIfNeededIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:topView:] + 259
29 0x90059e9e -[NSView _displayRectIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:] + 4817
30 0x90052dc9 -[NSView displayIfNeeded] + 1365
31 0x9004fb8d -[NSWindow displayIfNeeded] + 316
32 0x90052698 _handleWindowNeedsDisplayOrLayoutOrUpdateConstraints + 804
33 0x9785cebd _runLoopObserverWithBlockContext + 29
35 0x9782900d CFRunLoopDoObservers + 413
36 0x977fb984
CFRunLoopRun + 1044
37 0x977fb1dc CFRunLoopRunSpecific + 332
38 0x977fb088 CFRunLoopRunInMode + 120
39 0x983bc723 RunCurrentEventLoopInMode + 318
40 0x983c39b6 ReceiveNextEventCommon + 168
41 0x983c38fa BlockUntilNextEventMatchingListInMode + 88
42 0x900120d8 _DPSNextEvent + 678
43 0x90011942 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 113
44 0x9000dcb1 -[NSApplication run] + 911
45 org.wxwindows.. 0x000aba08 wxGUIEventLoop::OSXDoRun() + 166
46 org.wxwindows.. 0x00249e9e wxCFEventLoop::DoRun() + 40
47 org.wxwindows.. 0x001e270b wxEventLoopBase::Run() + 153
48 org.wxwindows.. 0x001bbb6a wxAppConsoleBase::MainLoop() + 102
49 org.wxwindows.. 0x00070a06 wxApp::OnRun() + 30
50 org.wxwindows.. 0x00205aa8 wxEntry(int&, wchar_t) + 110
51 org.wxwindows.. 0x00205b5d wxEntry(int&, char
) + 50
52 org.wxwindows.. 0x0000ca21 main + 30 (stctest.cpp:193)
53 org.wxwindows.. 0x00002b45 start + 53

Attachments (2)

utf8-multi-byte-crash download (7 bytes) - added by paulclinger 2 years ago.
multi-byte UTF character that crashes wxSTC
0001-Handle-wxString-encodings-other-than-UCS-2-in-Surfac.patch download (2.5 KB) - added by plorkyeran 21 months ago.

Download all attachments as: .zip

Change History (16)

Changed 2 years ago by paulclinger

multi-byte UTF character that crashes wxSTC

comment:1 Changed 2 years ago by paulclinger

Just want to add: this is on wxwidgets trunk from Oct 17th.

comment:2 Changed 2 years ago by paulclinger

I suspect the problem is with the code in SurfaceImpl::MeasureWidth; in fact, there is this comment and this is exactly when the problem happens:


Map the widths for UCS-2 characters back to the UTF-8 input string
NOTE: I don't think this is right for when sizeof(wxChar) > 2, ie wxGTK2
so figure it out and fix it!

Figuring it out...

comment:3 Changed 2 years ago by vadz

I don't think the problem is here. I don't quite understand what's going on here and it does look like the current code returns wrong results, but the crash clearly happens before it is even reached, as it's inside GetPartialTextExtents() call.

Now there is possibly a bug in our code, which should be reproducible without involving wxSTC at all, but just by calling this method on wxDC with the given string. But I also remember that there was a recent bug in OS X which didn't handle some Unicode characters correctly so maybe it's not our bug at all... In any case, it would be useful to provide a really minimal test case for this, not involving wxSTC, and debug inside GetPartialTextExtents() to see what happens there for U+1D000 (BYZANTINE MUSICAL SYMBOL PSILI, really? I wonder where does this come from...).

comment:4 Changed 2 years ago by paulclinger

Vadim: you are right; it fails in GetPartialTextExtents. I'm not sure how to debug what happens, but I noticed that wxString length returned is incorrect for some values (although it is correct for that specific one I have in the file). For example, wx.wxString("\194\123"):Len() returns 0, even though it's a valid UTF-8 string and should return 1.

My current code is compiled with only --enable-unicode. I tried to recompile with added --enable-utf8, but it didn't make any difference.

comment:5 Changed 2 years ago by paulclinger

  • Summary changed from Loading/displaying a valid multi-byte UTF8 character crashes application (wxSTC/OSX) to Loading/displaying an invalid multi-byte UTF8 character crashes application (wxSTC/OSX)

One more update; the decimal representation of the text I included in the original post was incorrect. The actual text includes two characters "\xED\xB3\xB6" and "\xED\xB2\xA2" (plus a newline), which are not valid UTF-8 characters. I updated the subject to reflect that.

Neil H. also confirmed that it's not a Scintilla issue:

I tested on some other invalid UTF-8 characters and they are handled correctly; for example, having "\xED" or "\xED\xB3" doesn't crash, but having "\xED\xB3\xB6" does.

comment:6 Changed 2 years ago by vadz

Could you please make the smallest possible example using just wxDC and these invalid characters reproducing the crash?

BTW, "\194\123" is not a valid UTF-8 sequence, if the numbers are supposed to be in decimal.

comment:7 Changed 2 years ago by paulclinger

Could you please make the smallest possible example using just wxDC and these invalid characters reproducing the crash?

I'll try shortly.

BTW, "\194\123" is not a valid UTF-8 sequence, if the numbers are supposed to be in decimal.

Bad example; it was supposed to be \128.

I think it may indeed be related to the (incorrect) length of that UTF-8 sequence. This is what I'm getting:

wx.wxString("\xED\xB2"):Len() == 0
wx.wxString("\xED\xB2\xA2"):Len() == 1 -- why is it 1 here if it's invalid?
wx.wxString("\xED\xB2\xA2\xED"):Len() == 0 -- why is it 0 here if it's 1 above?

comment:8 Changed 2 years ago by vadz

The problem here is that "\xED\xB2\xA2" is valid from UTF-8 point of view. However it decodes to U+DCA2 which doesn't exist (no Unicode character is assigned to this number). But I don't think wxString should be checking for this.

The other two are clearly invalid, they can't be decoded as UTF-8 at all.

All this still doesn't explain the crash however.

comment:9 Changed 21 months ago by plorkyeran

The immediate cause of the crash is that the conversion from wxString to CFString is failing due to it containing an invalid character, and wxMacCoreGraphicsContext::GetPartialTextExtents never checks that the CFString is non-NULL.

However, the same crash also occurs on all valid characters >= U+10000 due to that wx2stc and stc2wx assume wxString is UTF-16, and so you end up with a UTF-32 string containing UTF-16 surrogate pair characters. The simplest fix for this is to just use wxString's conversions to/from UTF-8 rather than Scintilla's, but there's a comment that suggests that the behavior of returning garbage rather than an empty string on invalid UTF-8 is desirable...

It's not responsible for the crash, but the attached patch makes MeasureWidths support characters >= U+10000 rather than only BMP characters.

comment:10 Changed 21 months ago by vadz

Thanks for the patch!

I wonder if the comment about surrogate pairs is true for the other systems or only OS X? It doesn't seem obvious to me that all of them behave like this...

Although, is the test for empty string just an optimization or really necessary? I'm not against adding it but I'd like to understand why.


comment:11 Changed 21 months ago by plorkyeran

I wonder if the comment about surrogate pairs is true for the other systems or only OS X?

Only Windows. The width calculated for the first wchar_t is that of the placeholder box for invalid characters, so I suppose it's a valid-but-not-useful value as opposed to garbage. If there's other platforms where wxString is UTF-16 (and the interface exposes that) they'll need to be tested, but I was under the impression that's only Windows.

The check for empty string is just cruft from trying multiple approaches and can be removed. I don't think Scintilla even calls that function for empty strings.

comment:12 Changed 21 months ago by vadz

Sorry, I've completely failed to take the #if SIZEOF_WCHAR_T == 2 into account somehow, you're right, currently this is the case for Windows only.

I'm going to apply this but the ticket is still valid, IIUC, as we still need to check for the validity of the string somewhere, probably at wxMacCGContext level (Stefan?).

Thanks again!

comment:13 Changed 21 months ago by VZ

In 76623:

Correct handling of the characters outside of the BMP in wxSTC.

The code mapping positions for the units of the UTF-13/32 string used by
wxWidgets to positions for the units of the UTF-8 string used by Scintilla
didn't work correctly for the characters outside of the BMP, i.e. Unicode code
points >= 0x10000.

See #15621.

comment:14 Changed 11 hours ago by vadz

  • Component changed from wxOSX (any toolkit) to wxOSX-Cocoa

Preparing to remove "wxOSX (any toolkit)" component: change all the bugs using to use "wxOSX-Cocoa" component instead.

Note: See TracTickets for help on using tickets.