Opened 3 years ago

Closed 4 months ago

#15621 closed defect (fixed)

Loading/displaying an invalid or >=U+10000 multi-byte UTF8 character crashes application (wxSTC/OSX)

Reported by: paulclinger Owned by: Vadim Zeitlin <vadim@…>
Priority: normal Milestone:
Component: wxOSX Version:
Keywords: wxSTC wxOSX Cc:
Blocked By: Blocking:
Patch: yes

Description

I'm using wxSTC and when I open the attached file (or any other file with character in the attached file), the app crashes with the following stacktrace; this is the stacktrace from the sample stctest.app. This only happens in wxSTC on OSX (I tests in Windows and it works; I'm quite certain Linux works as well).

This seems to be caused by different multi-byte characters. The one that is in the file is: \240\157\128\128.

In some cases the crash is immediate (before file even displayed). In some cases parts are displayed (when this code is in a larger file), but crashes when you try to scroll to the position with that character.

Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000000

VM Regions Near 0:
--> PAGEZERO 0000000000000000-0000000000001000 [ 4K] ---/--- SM=NUL /Users/USER/*/stctest.app/Contents/MacOS/stctest

TEXT 0000000000001000-00000000004ff000 [ 5112K] r-x/rwx SM=COW /Users/USER/*/stctest.app/Contents/MacOS/stctest]

Application Specific Information:
objc[80558]: garbage collection is OFF

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 com.apple.CoreFoundation 0x977c8c42 CFStringCreateCopy + 18
1 com.apple.CoreFoundation 0x97854538 CFAttributedStringCreate + 104
2 org.wxwindows.. 0x000793ab wxMacCoreGraphicsContext::GetPartialTextExtents(wxString const&, wxArrayDouble&) const + 323
3 org.wxwindows.. 0x00111a2e wxGCDCImpl::DoGetPartialTextExtents(wxString const&, wxArrayInt&) const + 222
4 org.wxwindows.. 0x00028fad SurfaceImpl::MeasureWidths(Font&, char const*, int, float*) + 99
5 org.wxwindows.. 0x0031155f PositionCache::MeasureWidths(Surface*, ViewStyle&, unsigned int, char const*, unsigned int, float*, Document*) + 507
6 org.wxwindows.. 0x002f2189 Editor::LayoutLine(int, Surface*, ViewStyle&, LineLayout*, int) + 2643
7 org.wxwindows.. 0x002fb368 Editor::Paint(Surface*, PRectangle) + 1764
8 org.wxwindows.. 0x0002db1c ScintillaWX::DoPaint(wxDC*, wxRect) + 314
9 org.wxwindows.. 0x0002089a wxStyledTextCtrl::OnPaint(wxPaintEvent&) + 108
10 org.wxwindows.. 0x001ba81a wxAppConsoleBase::HandleEvent(wxEvtHandler*, void (wxEvtHandler::*)(wxEvent&), wxEvent&) const + 42
11 org.wxwindows.. 0x001ba85d wxAppConsoleBase::CallEventHandler(wxEvtHandler*, wxEventFunctor&, wxEvent&) const + 59
12 org.wxwindows.. 0x00262750 wxEvtHandler::ProcessEventIfMatchesId(wxEventTableEntryBase const&, wxEvtHandler*, wxEvent&) + 106
13 org.wxwindows.. 0x0026425d wxEventHashTable::HandleEvent(wxEvent&, wxEvtHandler*) + 125
14 org.wxwindows.. 0x002642ba wxEvtHandler::TryHereOnly(wxEvent&) + 84
15 org.wxwindows.. 0x0026509a wxEvtHandler::TryBeforeAndHere(wxEvent&) + 42
16 org.wxwindows.. 0x00264302 wxEvtHandler::ProcessEventLocally(wxEvent&) + 26
17 org.wxwindows.. 0x00263ae9 wxEvtHandler::ProcessEvent(wxEvent&) + 203
18 org.wxwindows.. 0x00262a6c wxEvtHandler::SafelyProcessEvent(wxEvent&) + 34
19 org.wxwindows.. 0x001a17a3 wxWindowBase::HandleWindowEvent(wxEvent&) const + 27
20 org.wxwindows.. 0x00044d4f wxWindow::MacDoRedraw(long) + 807
21 org.wxwindows.. 0x000eb020 wxWidgetCocoaImpl::drawRect(void*, NSView*, void*) + 1384
22 org.wxwindows.. 0x000ec7a1 wxOSX_drawRect(NSView*, objc_selector*, _NSRect) + 673
23 com.apple.AppKit 0x900618b5 -[NSView _drawRect:clip:] + 3717
24 com.apple.AppKit 0x90090ce2 -[NSView _recursiveDisplayAllDirtyWithLockFocus:visRect:] + 1958
25 com.apple.AppKit 0x90091083 -[NSView _recursiveDisplayAllDirtyWithLockFocus:visRect:] + 2887
26 com.apple.AppKit 0x90091083 -[NSView _recursiveDisplayAllDirtyWithLockFocus:visRect:] + 2887
27 com.apple.AppKit 0x9005f212 -[NSView _recursiveDisplayRectIfNeededIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:topView:] + 708
28 com.apple.AppKit 0x9005e817 -[NSThemeFrame _recursiveDisplayRectIfNeededIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:topView:] + 259
29 com.apple.AppKit 0x90059e9e -[NSView _displayRectIgnoringOpacity:isVisibleRect:rectIsVisibleRectForView:] + 4817
30 com.apple.AppKit 0x90052dc9 -[NSView displayIfNeeded] + 1365
31 com.apple.AppKit 0x9004fb8d -[NSWindow displayIfNeeded] + 316
32 com.apple.AppKit 0x90052698 _handleWindowNeedsDisplayOrLayoutOrUpdateConstraints + 804
33 com.apple.CoreFoundation 0x9785cebd _runLoopObserverWithBlockContext + 29
34 com.apple.CoreFoundation 0x978290ce CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION + 30
35 com.apple.CoreFoundation 0x9782900d CFRunLoopDoObservers + 413
36 com.apple.CoreFoundation 0x977fb984
CFRunLoopRun + 1044
37 com.apple.CoreFoundation 0x977fb1dc CFRunLoopRunSpecific + 332
38 com.apple.CoreFoundation 0x977fb088 CFRunLoopRunInMode + 120
39 com.apple.HIToolbox 0x983bc723 RunCurrentEventLoopInMode + 318
40 com.apple.HIToolbox 0x983c39b6 ReceiveNextEventCommon + 168
41 com.apple.HIToolbox 0x983c38fa BlockUntilNextEventMatchingListInMode + 88
42 com.apple.AppKit 0x900120d8 _DPSNextEvent + 678
43 com.apple.AppKit 0x90011942 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 113
44 com.apple.AppKit 0x9000dcb1 -[NSApplication run] + 911
45 org.wxwindows.. 0x000aba08 wxGUIEventLoop::OSXDoRun() + 166
46 org.wxwindows.. 0x00249e9e wxCFEventLoop::DoRun() + 40
47 org.wxwindows.. 0x001e270b wxEventLoopBase::Run() + 153
48 org.wxwindows.. 0x001bbb6a wxAppConsoleBase::MainLoop() + 102
49 org.wxwindows.. 0x00070a06 wxApp::OnRun() + 30
50 org.wxwindows.. 0x00205aa8 wxEntry(int&, wchar_t) + 110
51 org.wxwindows.. 0x00205b5d wxEntry(int&, char
) + 50
52 org.wxwindows.. 0x0000ca21 main + 30 (stctest.cpp:193)
53 org.wxwindows.. 0x00002b45 start + 53

Attachments (2)

utf8-multi-byte-crash download (7 bytes) - added by paulclinger 3 years ago.
multi-byte UTF character that crashes wxSTC
0001-Handle-wxString-encodings-other-than-UCS-2-in-Surfac.patch download (2.5 KB) - added by plorkyeran 2 years ago.

Download all attachments as: .zip

Change History (22)

Changed 3 years ago by paulclinger

multi-byte UTF character that crashes wxSTC

comment:1 Changed 3 years ago by paulclinger

Just want to add: this is on wxwidgets trunk from Oct 17th.

comment:2 Changed 3 years ago by paulclinger

I suspect the problem is with the code in SurfaceImpl::MeasureWidth; in fact, there is this comment and this is exactly when the problem happens:

#if wxUSE_UNICODE

Map the widths for UCS-2 characters back to the UTF-8 input string
NOTE: I don't think this is right for when sizeof(wxChar) > 2, ie wxGTK2
so figure it out and fix it!

Figuring it out...

comment:3 Changed 3 years ago by vadz

I don't think the problem is here. I don't quite understand what's going on here and it does look like the current code returns wrong results, but the crash clearly happens before it is even reached, as it's inside GetPartialTextExtents() call.

Now there is possibly a bug in our code, which should be reproducible without involving wxSTC at all, but just by calling this method on wxDC with the given string. But I also remember that there was a recent bug in OS X which didn't handle some Unicode characters correctly so maybe it's not our bug at all... In any case, it would be useful to provide a really minimal test case for this, not involving wxSTC, and debug inside GetPartialTextExtents() to see what happens there for U+1D000 (BYZANTINE MUSICAL SYMBOL PSILI, really? I wonder where does this come from...).

comment:4 Changed 3 years ago by paulclinger

Vadim: you are right; it fails in GetPartialTextExtents. I'm not sure how to debug what happens, but I noticed that wxString length returned is incorrect for some values (although it is correct for that specific one I have in the file). For example, wx.wxString("\194\123"):Len() returns 0, even though it's a valid UTF-8 string and should return 1.

My current code is compiled with only --enable-unicode. I tried to recompile with added --enable-utf8, but it didn't make any difference.

comment:5 Changed 3 years ago by paulclinger

  • Summary changed from Loading/displaying a valid multi-byte UTF8 character crashes application (wxSTC/OSX) to Loading/displaying an invalid multi-byte UTF8 character crashes application (wxSTC/OSX)

One more update; the decimal representation of the text I included in the original post was incorrect. The actual text includes two characters "\xED\xB3\xB6" and "\xED\xB2\xA2" (plus a newline), which are not valid UTF-8 characters. I updated the subject to reflect that.

Neil H. also confirmed that it's not a Scintilla issue: https://groups.google.com/d/msg/scintilla-interest/8aZN3pSTcXg/pNlAfEUNWq8J.

I tested on some other invalid UTF-8 characters and they are handled correctly; for example, having "\xED" or "\xED\xB3" doesn't crash, but having "\xED\xB3\xB6" does.

comment:6 Changed 3 years ago by vadz

Could you please make the smallest possible example using just wxDC and these invalid characters reproducing the crash?

BTW, "\194\123" is not a valid UTF-8 sequence, if the numbers are supposed to be in decimal.

comment:7 Changed 3 years ago by paulclinger

Could you please make the smallest possible example using just wxDC and these invalid characters reproducing the crash?

I'll try shortly.

BTW, "\194\123" is not a valid UTF-8 sequence, if the numbers are supposed to be in decimal.

Bad example; it was supposed to be \128.

I think it may indeed be related to the (incorrect) length of that UTF-8 sequence. This is what I'm getting:

wx.wxString("\xED\xB2"):Len() == 0
wx.wxString("\xED\xB2\xA2"):Len() == 1 -- why is it 1 here if it's invalid?
wx.wxString("\xED\xB2\xA2\xED"):Len() == 0 -- why is it 0 here if it's 1 above?

comment:8 Changed 3 years ago by vadz

The problem here is that "\xED\xB2\xA2" is valid from UTF-8 point of view. However it decodes to U+DCA2 which doesn't exist (no Unicode character is assigned to this number). But I don't think wxString should be checking for this.

The other two are clearly invalid, they can't be decoded as UTF-8 at all.

All this still doesn't explain the crash however.

comment:9 Changed 2 years ago by plorkyeran

The immediate cause of the crash is that the conversion from wxString to CFString is failing due to it containing an invalid character, and wxMacCoreGraphicsContext::GetPartialTextExtents never checks that the CFString is non-NULL.

However, the same crash also occurs on all valid characters >= U+10000 due to that wx2stc and stc2wx assume wxString is UTF-16, and so you end up with a UTF-32 string containing UTF-16 surrogate pair characters. The simplest fix for this is to just use wxString's conversions to/from UTF-8 rather than Scintilla's, but there's a comment that suggests that the behavior of returning garbage rather than an empty string on invalid UTF-8 is desirable...

It's not responsible for the crash, but the attached patch makes MeasureWidths support characters >= U+10000 rather than only BMP characters.

comment:10 Changed 2 years ago by vadz

Thanks for the patch!

I wonder if the comment about surrogate pairs is true for the other systems or only OS X? It doesn't seem obvious to me that all of them behave like this...

Although, is the test for empty string just an optimization or really necessary? I'm not against adding it but I'd like to understand why.

TIA!

comment:11 Changed 2 years ago by plorkyeran

I wonder if the comment about surrogate pairs is true for the other systems or only OS X?

Only Windows. The width calculated for the first wchar_t is that of the placeholder box for invalid characters, so I suppose it's a valid-but-not-useful value as opposed to garbage. If there's other platforms where wxString is UTF-16 (and the interface exposes that) they'll need to be tested, but I was under the impression that's only Windows.

The check for empty string is just cruft from trying multiple approaches and can be removed. I don't think Scintilla even calls that function for empty strings.

comment:12 Changed 2 years ago by vadz

Sorry, I've completely failed to take the #if SIZEOF_WCHAR_T == 2 into account somehow, you're right, currently this is the case for Windows only.

I'm going to apply this but the ticket is still valid, IIUC, as we still need to check for the validity of the string somewhere, probably at wxMacCGContext level (Stefan?).

Thanks again!

comment:13 Changed 2 years ago by VZ

In 76623:

Correct handling of the characters outside of the BMP in wxSTC.

The code mapping positions for the units of the UTF-13/32 string used by
wxWidgets to positions for the units of the UTF-8 string used by Scintilla
didn't work correctly for the characters outside of the BMP, i.e. Unicode code
points >= 0x10000.

See #15621.

comment:14 Changed 6 months ago by vadz

  • Component changed from wxOSX (any toolkit) to wxOSX-Cocoa

Preparing to remove "wxOSX (any toolkit)" component: change all the bugs using to use "wxOSX-Cocoa" component instead.

comment:15 Changed 4 months ago by paulclinger

  • Patch set
  • Summary changed from Loading/displaying an invalid multi-byte UTF8 character crashes application (wxSTC/OSX) to Loading/displaying an invalid or >=U+10000 multi-byte UTF8 character crashes application (wxSTC/OSX)

I posted a message to the Scintilla maillist about a crash on OSX not realizing that it's the same issue as described in this ticket (https://groups.google.com/forum/#!topic/scintilla-interest/FMqLpfZCOF4).

The crash is caused by what plorkyeran described in comment 9:

However, the same crash also occurs on all valid characters >= U+10000 due to that wx2stc and stc2wx assume wxString is UTF-16, and so you end up with a UTF-32 string containing UTF-16 surrogate pair characters. The simplest fix for this is to just use wxString's conversions to/from UTF-8 rather than Scintilla's, but there's a comment that suggests that the behavior of returning garbage rather than an empty string on invalid UTF-8 is desirable...

The issue can be easily seen on trying to show or get text extent on a text that includes the following symbol: http://apps.timwhitlock.info/unicode/inspect?s=%F0%9F%98%88 (code: 1F608, UTF-8 F09F9888). The application crashes with a stack trace like the following one (this is running recent wxwidgets master after 3.6.3 merge):

0   com.apple.CoreFoundation      	0x98487562 CFStringCreateCopy + 18
1   com.apple.CoreFoundation      	0x984eb7df CFAttributedStringCreate + 95
2   libwx.dylib                   	0x01c288fc wxMacCoreGraphicsContext::GetPartialTextExtents(wxString const&, wxArrayDouble&) const + 306 (graphics.cpp:2380)
3   libwx.dylib                   	0x01cd94b0 wxGCDCImpl::DoGetPartialTextExtents(wxString const&, wxArrayInt&) const + 128 (dcgraph.cpp:1108)
4   libwx.dylib                   	0x01ab613b SurfaceImpl::MeasureWidths(Font&, char const*, int, float*) + 123 (basic_string.h:280)
5   libwx.dylib                   	0x0200f087 PositionCache::MeasureWidths(Surface*, ViewStyle const&, unsigned int, char const*, unsigned int, float*, Document*) + 547 (PositionCache.cxx:690)
6   libwx.dylib                   	0x01fe4249 EditView::LayoutLine(EditModel const&, int, Surface*, ViewStyle const&, LineLayout*, int) + 2099 (PositionCache.h:144)
7   libwx.dylib                   	0x01fff592 Editor::WrapCount(int) + 140 (Editor.cxx:5513)
8   libwx.dylib                   	0x020017de Editor::WndProc(unsigned int, unsigned long, long) + 6936 (Editor.cxx:6946)
9   libwx.dylib                   	0x01aa80ed wxStyledTextCtrl::WrapCount(int) + 49 (stc.cpp:2027)

The functions that lead to the crash may differ a bit, but it's always one of GetPartialTextExtents, GetTextExtent, and DoDrawText. The failure is caused by wxCFStringRef generating a null pointer, which is then sent to CFAttributedStringCreate, leading to the crash.

I fixed the issue with the following patch:

diff --git a/src/osx/carbon/graphics.cpp b/src/osx/carbon/graphics.cpp
index 2d9cb96..d622c36 100644
--- a/src/osx/carbon/graphics.cpp
+++ b/src/osx/carbon/graphics.cpp
@@ -2233,6 +2233,8 @@ void wxMacCoreGraphicsContext::DoDrawText( const wxString &str, wxDouble x, wxDo
 
     wxMacCoreGraphicsFontData* fref = (wxMacCoreGraphicsFontData*)m_font.GetRefData();
     wxCFStringRef text(str, wxLocale::GetSystemEncoding() );
+    if (NULL == text)
+        return;
     CTFontRef font = fref->OSXGetCTFont();
     CGColorRef col = wxMacCreateCGColor( fref->GetColour() );
 #if 0
@@ -2332,6 +2334,8 @@ void wxMacCoreGraphicsContext::GetTextExtent( const wxString &str, wxDouble *wid
     CTFontRef font = fref->OSXGetCTFont();
 
     wxCFStringRef text(strToMeasure, wxLocale::GetSystemEncoding() );
+    if (NULL == text)
+        return;
     CFStringRef keys[] = { kCTFontAttributeName  };
     CFTypeRef values[] = { font };
     wxCFRef<CFDictionaryRef> attributes( CFDictionaryCreate(kCFAllocatorDefault, (const void**) &keys, (const void**) &values,
@@ -2372,6 +2376,8 @@ void wxMacCoreGraphicsContext::GetPartialTextExtents(const wxString& text, wxArr
     CTFontRef font = fref->OSXGetCTFont();
 
     wxCFStringRef t(text, wxLocale::GetSystemEncoding() );
+    if (NULL == t)
+        return;
     CFStringRef keys[] = { kCTFontAttributeName  };
     CFTypeRef values[] = { font };
     wxCFRef<CFDictionaryRef> attributes( CFDictionaryCreate(kCFAllocatorDefault, (const void**) &keys, (const void**) &values,

However, this only fixes the symptoms, as the crash is gone, but the unicode character is still not displayed (although it's loaded and saved).

This patch can be applied as it at least solves the crash issue (which includes the original issue this ticket was created for); unfortunately I don't know much about Unicode handling in wx to assist with fixing it in a more complete way.

comment:16 Changed 4 months ago by minoki

Rewrite of stc2wx/wx2stc functions is needed to properly handle Unicode characters outside BMP.

The simplest way is to just use wxMBConvUTF8 converter, as suggested in comment:9, but there is a behavior change regarding how to handle an invalid UTF-8 sequence:

  • Current code doesn't check the correctness of input. The result won't be empty even if the input is invalid as UTF-8.
  • Using wxConvUTF8 would return empty string on invalid input.
  • If you pass wxMBConvUTF8::MAP_INVALID_UTF8_TO_PUA or wxMBConvUTF8::MAP_INVALID_UTF8_TO_OCTAL option to wxMBConvUTF8, some valid Unicode sequences are escaped or unescaped, so this is not an option here.

I'm not sure if this difference really matters (i.e. will stc2wx ever receive an invalid UTF-8 sequence? ). Maybe I'm worrying too much and nobody cares what he get for passing an invalid UTF-8 sequence, empty or garbage.

I'm posting a patch to make stc2wx/wx2stc use the standard UTF-8 conversion functions of wxString (which is same as using wxConvUTF8):

diff --git a/src/stc/PlatWX.cpp b/src/stc/PlatWX.cpp
index 8c25769..14e36a0 100644
--- a/src/stc/PlatWX.cpp
+++ b/src/stc/PlatWX.cpp
@@ -1618,44 +1618,19 @@ double ElapsedTime::Duration(bool reset) {
 
 #if wxUSE_UNICODE
 
-#include "UniConversion.h"
-
-// Convert using Scintilla's functions instead of wx's, Scintilla's are more
-// forgiving and won't assert...
-
 wxString stc2wx(const char* str, size_t len)
 {
-    if (!len)
-        return wxEmptyString;
-
-    size_t wclen = UTF16Length(str, len);
-    wxWCharBuffer buffer(wclen+1);
-
-    size_t actualLen = UTF16FromUTF8(str, len, buffer.data(), wclen+1);
-    return wxString(buffer.data(), actualLen);
+    return wxString::FromUTF8(str, len);
 }
 
-
-
 wxString stc2wx(const char* str)
 {
-    return stc2wx(str, strlen(str));
+    return wxString::FromUTF8(str);
 }
 
-
-wxWX2MBbuf wx2stc(const wxString& str)
+wxCharBuffer wx2stc(const wxString& str)
 {
-    const wchar_t* wcstr = str.c_str();
-    size_t wclen         = str.length();
-    size_t len           = UTF8Length(wcstr, wclen);
-
-    // The buffer object adds extra byte for the terminating NUL and we must
-    // pass the total length, including this NUL, to UTF8FromUTF16() to ensure
-    // that it NULL-terminates the string.
-    wxCharBuffer buffer(len);
-    UTF8FromUTF16(wcstr, wclen, buffer.data(), len + 1);
-
-    return buffer;
+    return str.utf8_str();
 }
 
 #endif

comment:17 follow-up: Changed 4 months ago by paulclinger

@minoki, this is great! Your patch works for me and the character is displayed as expected in wxSTC.

I skipped the second change ("- return stc2wx(str, strlen(str)); + return wxString::FromUTF8(str);") as it makes no difference with the current one, but so far it's working without crashes.

The one thing I noticed is that it looks like the character I use has zero width (the cursor moves through it), but it's still much better than what we currently have.

I'm not sure if this difference really matters (i.e. will stc2wx ever receive an invalid UTF-8 sequence? ).

I think it's possible for stc2wx to receive an invalid UTF-8 sequence, as wxSTC provides *Raw methods that simply pass whatever (binary) content you give them to Scintilla (like AddTextRaw). I tested on those characters and they are displayed as special symbols with binary code (for example, [xED]) without issues.

Vadim: any reasons to not apply this patch?

comment:18 in reply to: ↑ 17 ; follow-up: Changed 4 months ago by minoki

Replying to paulclinger:

The one thing I noticed is that it looks like the character I use has zero width (the cursor moves through it), but it's still much better than what we currently have.

That's probably caused by a bug in wxMacCoreGraphicsContext::GetPartialTextExtents, which is not relevant to this issue (i.e. no application crash).

See PR #261 for details.

comment:19 in reply to: ↑ 18 Changed 4 months ago by paulclinger

Replying to minoki:

That's probably caused by a bug in wxMacCoreGraphicsContext::GetPartialTextExtents, which is not relevant to this issue (i.e. no application crash).
See PR #261 for details.

Indeed; applying the patch from the pull request solves the "width" issue of those symbols as well and the cursor moves correctly around them.

comment:20 Changed 4 months ago by Vadim Zeitlin <vadim@…>

  • Owner set to Vadim Zeitlin <vadim@…>
  • Resolution set to fixed
  • Status changed from new to closed

In 01046f4790c15f0a7ef1d352eddfa95c5ccf4451/git-wxWidgets:

Use wxWidgets conversions to/from UTF-8 in Scintilla code

Scintilla conversions use UTF-16, as indicated by their names, while wxString
uses UTF-32 under Unix, so they don't work correctly for the characters
outside of BMP.

Simply use our own UTF-8 conversions, it doesn't seem possible to get invalid
UTF-8 sequences here anyhow, so the comment saying that Scintilla functions
are used to avoid asserts doesn't seem to be relevant.

Closes #15621.

Note: See TracTickets for help on using tickets.