Opened 5 years ago

Closed 5 years ago

#11754 closed defect (fixed)

wxCmdLine asserts when parsing a UTF-8 encoded command line

Reported by: alarsen Owned by:
Priority: normal Milestone: 2.9.1
Component: base Version: stable-latest
Keywords: wxCmdLine UTF-8 Cc:
Blocked By: Blocking:
Patch: no

Description

wxCmdLineParserData::SetArguments() assumes that argv[] contains only 7-bit ASCII text, causing an assert in wxString::FromAscii() if a command-line argument contains UTF-8 characters (as may well be the case on Linux).

This patch should do the trick:

  • src/common/cmdline.cpp

    a b void wxCmdLineParserData::SetArguments(int argc, char **arg 
    236236 
    237237    for ( int n = 0; n < argc; n++ ) 
    238238    { 
     239#if wxUSE_UNICODE_UTF8 
     240        m_arguments.push_back(wxString::FromUTF8(argv[n])); 
     241#else 
    239242        m_arguments.push_back(wxString::FromAscii(argv[n])); 
     243#endif // wxUSE_UNICODE_UTF8 
    240244    } 
    241245} 
    242246 

Cheers
Anders

Change History (4)

comment:1 follow-up: Changed 5 years ago by vadz

  • Component changed from wxGTK to base
  • Status changed from new to confirmed

Indeed, it's wrong to assume that the arguments are in ASCII, thanks for noticing this. But I don't think we can assume they're always in UTF-8 neither. They probably are in the current locale encoding, i.e. we should simply use wxString ctor to convert argv[n] to wxString, shouldn't we?

comment:2 in reply to: ↑ 1 Changed 5 years ago by alarsen

Replying to vadz:

we should simply use wxString ctor to convert argv[n] to wxString, shouldn't we?

As always, you're right.

Cheers
Anders

comment:3 Changed 5 years ago by vadz

Actually I'm not entirely sure about this... The program might not be using the same locale as the shell (or whatever) that launched it. But OTOH I don't see what else can we do, there doesn't seem to be any way to get the encoding information. So I'll still change it like this just because it will at least work in the common case of using UTF-8 everywhere.

comment:4 Changed 5 years ago by VZ

  • Resolution set to fixed
  • Status changed from confirmed to closed

(In [63605]) Use user locale with Latin-1 as fallback for command line arguments.

Command line arguments can contain characters outside of 7 bit ASCII range.
Assume that they use the default user encoding but fall back to Latin-1 if
conversion failed.

Closes #11754.

Note: See TracTickets for help on using tickets.