Opened 10 months ago

Last modified 3 months ago

#15621 new defect

Loading/displaying an invalid multi-byte UTF8 character crashes application (wxSTC/OSX)

Reported by: paulclinger Owned by:
Priority: normal Milestone:
Component: wxOSX (any toolkit) Version:
Keywords: wxSTC wxOSX Cc:
Blocked By: Blocking:
Patch: no

Description

I'm using wxSTC and when I open the attached file (or any other file with character in the attached file), the app crashes with the following stacktrace; this is the stacktrace from the sample stctest.app. This only happens in wxSTC on OSX (I tests in Windows and it works; I'm quite certain Linux works as well).

This seems to be caused by different multi-byte characters. The one that is in the file is: \240\157\128\128.

In some cases the crash is immediate (before file even displayed). In some cases parts are displayed (when this code is in a larger file), but crashes when you try to scroll to the position with that character.

Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000000

VM Regions Near 0:
--> PAGEZERO 0000000000000000-0000000000001000 [ 4K] ---/--- SM=NUL /Users/USER/*/stctest.app/Contents/MacOS/stctest

TEXT 0000000000001000-00000000004ff000 [ 5112K] r-x/rwx SM=COW /Users/USER/*/stctest.app/Contents/MacOS/stctest]

Application Specific Information:
objc[80558]: garbage collection is OFF

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 com.apple.CoreFoundation 0x977c8c42 CFStringCreateCopy + 18
1 com.apple.CoreFoundation 0x97854538 CFAttributedStringCreate + 104
2 org.wxwindows.. 0x000793ab wxMacCoreGraphicsContext::GetPartialTextExtents(wxString const&, wxArrayDouble&) const + 323
3 org.wxwindows.. 0x00111a2e wxGCDCImpl::DoGetPartialTextExtents(wxString const&, wxArrayInt&) const + 222
4 org.wxwindows.. 0x00028fad SurfaceImpl::MeasureWidths(Font&, char const*, int, float*) + 99
5 org.wxwindows.. 0x0031155f PositionCache::MeasureWidths(Surface*, ViewStyle&, unsigned int, char const*, unsigned int, float*, Document*) + 507
6 org.wxwindows.. 0x002f2189 Editor::LayoutLine(int, Surface*, ViewStyle&, LineLayout*, int) + 2643
7 org.wxwindows.. 0x002fb368 Editor::Paint(Surface*, PRectangle) + 1764
8 org.wxwindows.. 0x0002db1c ScintillaWX::DoPaint(wxDC*, wxRect) + 314
9 org.wxwindows.. 0x0002089a wxStyledTextCtrl::OnPaint(wxPaintEvent&) + 108
10 org.wxwindows.. 0x001ba81a wxAppConsoleBase::HandleEvent(wxEvtHandler*, void (wxEvtHandler::*)(wxEvent&), wxEvent&) const + 42
11 org.wxwindows.. 0x001ba85d wxAppConsoleBase::CallEventHandler(wxEvtHandler*, wxEventFunctor&, wxEvent&) const + 59
12 org.wxwindows.. 0x00262750 wxEvtHandler::ProcessEventIfMatchesId(wxEventTableEntryBase const&, wxEvtHandler*, wxEvent&) + 106
13 org.wxwindows.. 0x0026425d wxEventHashTable::HandleEvent(wxEvent&, wxEvtHandler*) + 125
14 org.wxwindows.. 0x002642ba wxEvtHandler::TryHereOnly(wxEvent&) + 84
15 org.wxwindows.. 0x0026509a wxEvtHandler::TryBeforeAndHere(wxEvent&) + 42
16 org.wxwindows.. 0x00264302 wxEvtHandler::ProcessEventLocally(wxEvent&) + 26
17 org.wxwindows.. 0x00263ae9 wxEvtHandler::ProcessEvent(wxEvent&) + 203
18 org.wxwindows.. 0x00262a6c wxEvtHandler::SafelyProcessEvent(wxEvent&) + 34
19 org.wxwindows.. 0x001a17a3 wxWindowBase::HandleWindowEvent(wxEvent&) const + 27
20 org.wxwindows.. 0x00044d4f wxWindow::MacDoRedraw(long) + 807
21 org.wxwindows.. 0x000eb020 wxWidgetCocoaImpl::drawRect(void*, NSView*, void*) + 1384
22 org.wxwindows.. 0x000ec7a1 wxOSX_drawRect(NSView*, objc_selector*, _NSRect) + 673
23 com.apple.AppKit 0x900618b5 -[NSView _drawRect:clip:] + 3717
24 com.apple.AppKit 0x90090ce2 -[NSView _recursiveDisplayAllDirtyWithLockFocus:visRect:] + 1958
25 com.apple.AppKit 0x90091083 -[NSView _recursiveDisplayAllDirtyWithLockFocus:visRect:] + 2887
26 com.apple.AppKit 0x90091083 -[NSView _recursiveDisplayAllDirtyWithLockFocus:visRect:] + 2887
27 com.apple.AppKit 0x9005f212 -[NSView _recursiveDisplayRectIfNeededIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:topView:] + 708
28 com.apple.AppKit 0x9005e817 -[NSThemeFrame _recursiveDisplayRectIfNeededIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:topView:] + 259
29 com.apple.AppKit 0x90059e9e -[NSView _displayRectIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:] + 4817
30 com.apple.AppKit 0x90052dc9 -[NSView displayIfNeeded] + 1365
31 com.apple.AppKit 0x9004fb8d -[NSWindow displayIfNeeded] + 316
32 com.apple.AppKit 0x90052698 _handleWindowNeedsDisplayOrLayoutOrUpdateConstraints + 804
33 com.apple.CoreFoundation 0x9785cebd _runLoopObserverWithBlockContext + 29
34 com.apple.CoreFoundation 0x978290ce CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION + 30
35 com.apple.CoreFoundation 0x9782900d CFRunLoopDoObservers + 413
36 com.apple.CoreFoundation 0x977fb984
CFRunLoopRun + 1044
37 com.apple.CoreFoundation 0x977fb1dc CFRunLoopRunSpecific + 332
38 com.apple.CoreFoundation 0x977fb088 CFRunLoopRunInMode + 120
39 com.apple.HIToolbox 0x983bc723 RunCurrentEventLoopInMode + 318
40 com.apple.HIToolbox 0x983c39b6 ReceiveNextEventCommon + 168
41 com.apple.HIToolbox 0x983c38fa BlockUntilNextEventMatchingListInMode + 88
42 com.apple.AppKit 0x900120d8 _DPSNextEvent + 678
43 com.apple.AppKit 0x90011942 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 113
44 com.apple.AppKit 0x9000dcb1 -[NSApplication run] + 911
45 org.wxwindows.. 0x000aba08 wxGUIEventLoop::OSXDoRun() + 166
46 org.wxwindows.. 0x00249e9e wxCFEventLoop::DoRun() + 40
47 org.wxwindows.. 0x001e270b wxEventLoopBase::Run() + 153
48 org.wxwindows.. 0x001bbb6a wxAppConsoleBase::MainLoop() + 102
49 org.wxwindows.. 0x00070a06 wxApp::OnRun() + 30
50 org.wxwindows.. 0x00205aa8 wxEntry(int&, wchar_t) + 110
51 org.wxwindows.. 0x00205b5d wxEntry(int&, char
) + 50
52 org.wxwindows.. 0x0000ca21 main + 30 (stctest.cpp:193)
53 org.wxwindows.. 0x00002b45 start + 53

Attachments (2)

utf8-multi-byte-crash download (7 bytes) - added by paulclinger 10 months ago.
multi-byte UTF character that crashes wxSTC
0001-Handle-wxString-encodings-other-than-UCS-2-in-Surfac.patch download (2.5 KB) - added by plorkyeran 3 months ago.

Download all attachments as: .zip

Change History (15)

Changed 10 months ago by paulclinger

multi-byte UTF character that crashes wxSTC

comment:1 Changed 10 months ago by paulclinger

Just want to add: this is on wxwidgets trunk from Oct 17th.

comment:2 Changed 10 months ago by paulclinger

I suspect the problem is with the code in SurfaceImpl::MeasureWidth; in fact, there is this comment and this is exactly when the problem happens:

#if wxUSE_UNICODE

Map the widths for UCS-2 characters back to the UTF-8 input string
NOTE: I don't think this is right for when sizeof(wxChar) > 2, ie wxGTK2
so figure it out and fix it!

Figuring it out...

comment:3 Changed 10 months ago by vadz

I don't think the problem is here. I don't quite understand what's going on here and it does look like the current code returns wrong results, but the crash clearly happens before it is even reached, as it's inside GetPartialTextExtents() call.

Now there is possibly a bug in our code, which should be reproducible without involving wxSTC at all, but just by calling this method on wxDC with the given string. But I also remember that there was a recent bug in OS X which didn't handle some Unicode characters correctly so maybe it's not our bug at all... In any case, it would be useful to provide a really minimal test case for this, not involving wxSTC, and debug inside GetPartialTextExtents() to see what happens there for U+1D000 (BYZANTINE MUSICAL SYMBOL PSILI, really? I wonder where does this come from...).

comment:4 Changed 10 months ago by paulclinger

Vadim: you are right; it fails in GetPartialTextExtents. I'm not sure how to debug what happens, but I noticed that wxString length returned is incorrect for some values (although it is correct for that specific one I have in the file). For example, wx.wxString("\194\123"):Len() returns 0, even though it's a valid UTF-8 string and should return 1.

My current code is compiled with only --enable-unicode. I tried to recompile with added --enable-utf8, but it didn't make any difference.

comment:5 Changed 10 months ago by paulclinger

  • Summary changed from Loading/displaying a valid multi-byte UTF8 character crashes application (wxSTC/OSX) to Loading/displaying an invalid multi-byte UTF8 character crashes application (wxSTC/OSX)

One more update; the decimal representation of the text I included in the original post was incorrect. The actual text includes two characters "\xED\xB3\xB6" and "\xED\xB2\xA2" (plus a newline), which are not valid UTF-8 characters. I updated the subject to reflect that.

Neil H. also confirmed that it's not a Scintilla issue: https://groups.google.com/d/msg/scintilla-interest/8aZN3pSTcXg/pNlAfEUNWq8J.

I tested on some other invalid UTF-8 characters and they are handled correctly; for example, having "\xED" or "\xED\xB3" doesn't crash, but having "\xED\xB3\xB6" does.

comment:6 Changed 10 months ago by vadz

Could you please make the smallest possible example using just wxDC and these invalid characters reproducing the crash?

BTW, "\194\123" is not a valid UTF-8 sequence, if the numbers are supposed to be in decimal.

comment:7 Changed 10 months ago by paulclinger

Could you please make the smallest possible example using just wxDC and these invalid characters reproducing the crash?

I'll try shortly.

BTW, "\194\123" is not a valid UTF-8 sequence, if the numbers are supposed to be in decimal.

Bad example; it was supposed to be \128.

I think it may indeed be related to the (incorrect) length of that UTF-8 sequence. This is what I'm getting:

wx.wxString("\xED\xB2"):Len() == 0
wx.wxString("\xED\xB2\xA2"):Len() == 1 -- why is it 1 here if it's invalid?
wx.wxString("\xED\xB2\xA2\xED"):Len() == 0 -- why is it 0 here if it's 1 above?

comment:8 Changed 10 months ago by vadz

The problem here is that "\xED\xB2\xA2" is valid from UTF-8 point of view. However it decodes to U+DCA2 which doesn't exist (no Unicode character is assigned to this number). But I don't think wxString should be checking for this.

The other two are clearly invalid, they can't be decoded as UTF-8 at all.

All this still doesn't explain the crash however.

comment:9 Changed 3 months ago by plorkyeran

The immediate cause of the crash is that the conversion from wxString to CFString is failing due to it containing an invalid character, and wxMacCoreGraphicsContext::GetPartialTextExtents never checks that the CFString is non-NULL.

However, the same crash also occurs on all valid characters >= U+10000 due to that wx2stc and stc2wx assume wxString is UTF-16, and so you end up with a UTF-32 string containing UTF-16 surrogate pair characters. The simplest fix for this is to just use wxString's conversions to/from UTF-8 rather than Scintilla's, but there's a comment that suggests that the behavior of returning garbage rather than an empty string on invalid UTF-8 is desirable...

It's not responsible for the crash, but the attached patch makes MeasureWidths support characters >= U+10000 rather than only BMP characters.

comment:10 Changed 3 months ago by vadz

Thanks for the patch!

I wonder if the comment about surrogate pairs is true for the other systems or only OS X? It doesn't seem obvious to me that all of them behave like this...

Although, is the test for empty string just an optimization or really necessary? I'm not against adding it but I'd like to understand why.

TIA!

comment:11 Changed 3 months ago by plorkyeran

I wonder if the comment about surrogate pairs is true for the other systems or only OS X?

Only Windows. The width calculated for the first wchar_t is that of the placeholder box for invalid characters, so I suppose it's a valid-but-not-useful value as opposed to garbage. If there's other platforms where wxString is UTF-16 (and the interface exposes that) they'll need to be tested, but I was under the impression that's only Windows.

The check for empty string is just cruft from trying multiple approaches and can be removed. I don't think Scintilla even calls that function for empty strings.

comment:12 Changed 3 months ago by vadz

Sorry, I've completely failed to take the #if SIZEOF_WCHAR_T == 2 into account somehow, you're right, currently this is the case for Windows only.

I'm going to apply this but the ticket is still valid, IIUC, as we still need to check for the validity of the string somewhere, probably at wxMacCGContext level (Stefan?).

Thanks again!

comment:13 Changed 3 months ago by VZ

In 76623:

Correct handling of the characters outside of the BMP in wxSTC.

The code mapping positions for the units of the UTF-13/32 string used by
wxWidgets to positions for the units of the UTF-8 string used by Scintilla
didn't work correctly for the characters outside of the BMP, i.e. Unicode code
points >= 0x10000.

See #15621.

Note: See TracTickets for help on using tickets.