Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#17461 closed enhancement (fixed)

UTF-8 std::string interoperability

Reported by: minoki Owned by: Vadim Zeitlin <vadim@…>
Priority: low Milestone:
Component: base Version: dev-latest
Keywords: UTF-8, wxString, std::string Cc:
Blocked By: Blocking:
Patch: no

Description

It is necessary to convert wxString to/from std::string when one uses std::string in non-GUI code.
There have been member functions and conversion constructors to do that.
However, there is no canonical way to convert to/from UTF-8 std::string, not using locale-dependent encoding.
Existing methods to handle UTF-8 strings are inconvenient because one has to choose between

  • to end up with .c_str(), and let library recalculate string length every time, and give up correct handling of embedded NULs, or
  • to bother to pass around the length of the string.

So I propose to add canonical methods to handle UTF-8 std::string.

UTF-8 std::string to wxString

Add overloads for wxString::FromUTF8/FromUTF8Unchecked:

/* In class wxString: */
#if wxUSE_STD_STRING
    static wxString FromUTF8(const std::string &s);
    static wxString FromUTF8Unchecked(const std::string &s);
#endif

wxString to UTF-8 std::string

Add a member function to wxString:

/* In class wxString: */
#if wxUSE_STD_STRING
    std::string utf8_std_str() const; // the name is hypothetical
#endif

If it is unfavorable to add yet another member function when we already have utf8_str() and ToStdString(), there are two options I came up with:

  1. Add a wxMBConv parameter to wxString::ToStdString() and write str.ToStdString(wxConvUTF8).
  2. Add a conversion operator for wxScopedCharBuffer returning a std::string.

Notes

If you use UTF-8 build of wx and want to avoid unnecessary copies, adding overloads with rvalue reference might be an option:

/* In class wxString: */
#if wxUSE_STD_STRING && ... /* ... we have rvalue references ... */
    std::string utf8_std_str() &&;
    static wxString FromUTF8(std::string&& s);
    static wxString FromUTF8Unchecked(std::string&& s);
#endif
};

We may also add std::string overloads for the following functions, but it would be of lower priority.

  • wxString::FromAscii
  • wxString::From8BitData
  • wxUString::FromAscii
  • wxUString::FromUTF8

Change History (8)

comment:1 follow-up: Changed 5 years ago by minoki

I prepared a commit resolving 'UTF-8 std::string to wxString' part (i.e. add wxString::FromUTF8(std::string) overload):
https://github.com/minoki/wxWidgets/commit/31e92d92e1e2325a02a71cd6a63a4afa13a5a5dd

You can just merge the branch, or cherry-pick the commit. Or, should I make a pull request?

comment:2 in reply to: ↑ 1 Changed 5 years ago by minoki

I slightly updated the commit to eliminate #if wxUSE_STL_BASED_WXSTRING:
https://github.com/minoki/wxWidgets/commit/2ffd04643ea63cddc64752c7f91d24119733b6a2
It is push -f-ed to the same branch.

comment:3 follow-up: Changed 5 years ago by vadz

  • Priority changed from normal to low
  • Status changed from new to confirmed

Thanks, I agree it would be useful to have this, and adding FromUTF8() overloads is absolutely uncontroversial.

I'm less sure about utf8_std_str() as this is yet another function to tell people about and explain how it is different from the rest. Adding wxMBConv parameter to ToStdString() seems like a better idea to me.

As for using rvalue references, let's leave this for a separate patch/PR/ticket as it's unrelated, even if this is indeed something we should look into. But it should be done not only here, so we need to come with some macros to make it easy to conditionally enable such ctors for both C++1x and legacy compilers...

Finally, please do make PRs, if nothing else this triggers CI builds which is useful to give us confidence that the build passes. TIA!

comment:4 in reply to: ↑ 3 Changed 5 years ago by minoki

I updated the commit according to your comments on GitHub, and also added a commit making ToStdString accept a wxMBConv.

I made a pull request: #259

comment:5 Changed 5 years ago by ARATA Mizuki <minorinoki@…>

In 81e6638585ac8490be004fc7c55a09e96fca94bd/git-wxWidgets:

Add overloads of wxString::FromUTF8/FromUTF8Unchecked taking a std::string

See #17461.

comment:6 Changed 5 years ago by ARATA Mizuki <minorinoki@…>

In 70ddab243e220bdd630af9aaae9d10efc57b1eb7/git-wxWidgets:

Add wxMBConv parameter to wxString::ToStdString

See #17461.

comment:7 Changed 5 years ago by Vadim Zeitlin <vadim@…>

  • Owner set to Vadim Zeitlin <vadim@…>
  • Resolution set to fixed
  • Status changed from confirmed to closed

In 4e4286f0e2470737a8ca715bb96795987b661224/git-wxWidgets:

Merge branch 'utf8-stdstring-interop' of https://github.com/minoki/wxWidgets

Make it easier to interoperate with the code using UTF-8-encoded std::strings.

Closes #17461.

comment:8 Changed 5 years ago by Vadim Zeitlin <vadim@…>

In 875a40be138fa450ac09535de032d41024c56b51/git-wxWidgets:

Fix wxString::ToStdString(wxMBConv) to compile in ANSI build

70ddab243e220bdd630af9aaae9d10efc57b1eb7 broke compilation without Unicode as
mb_str() doesn't return a buffer in this case.

See #17461.

Note: See TracTickets for help on using tickets.