Opened 14 years ago

Closed 5 months ago

#2793 closed defect (fixed)

wxProtocol::GetLine() doesn't support non-ASCII characters

Reported by: wxmjohn Owned by: Vadim Zeitlin <vadim@…>
Priority: low Milestone:
Component: network Version:
Keywords: wxProtocol wxFTP Cc: wxmjohn
Blocked By: Blocking:
Patch: no

Description

wxProtocolError GetLine(wxSocketBase *sock,
wxString& result)
{
...

result = wxString::FromAscii( tmp_str );
result = result.Left(result.Length()-1);

size = ret-tmp_str+1;
sock->Unread(&tmp_buf[size], avail-size);

...
}
the bug line is
result = wxString::FromAscii( tmp_str );

it shoud chaned to

result.sprintf(wxT("%s"),wxConvertMB2WX(tmp_str));

because int Protocol header may be has chinese word
or other asia words. for example. reference to
RFC2616, in http protocol header, Response Header
Fields "Location" maybe has the new
url and the url may has chinese words or escaped
chinese words.

FromAscii( tmp_str ) can't deal with chinese words,
because whether in ansi code or in unicode, chinese
words use two byte(char), deal it one byte by one byte
is not right.

Change History (10)

comment:1 Changed 14 years ago by vadz

First, please choose the type of your submissions crrectly.
From 3 "patches" you submitted, neither is a patch.

Second, FromAscii() here is wrong, of course, but so is
using wxConvertMB2WX() because it uses the current locale
encoding while we need to get the encoding from the peer. So
much more needs to be done to make this work than just the
above.

comment:2 Changed 11 years ago by wojdyr

  • Component set to network
  • Keywords wxProtocol added

comment:3 Changed 8 years ago by oneeyeman

Can someone who can read Chinese verify that?

comment:4 Changed 8 years ago by vadz

  • Cc vadz removed
  • Resolution set to invalid
  • Status changed from new to closed

Looking at this again, I think the report is completely wrong. HTTP headers can only contain 7 bit (US ASCII) characters, the "absoluteURI", as defined in RFC 2396 definitely is 7 bit only. So using FromAscii() here is just fine. The problem -- whatever it is -- probably happens when the returned value is not decoded correctly later but I have no idea where does this happen and the bug doesn't provide any way to reproduce the problem.

comment:5 Changed 6 years ago by hartwigw

  • Keywords wxFTP added
  • Resolution invalid deleted
  • Status changed from closed to reopened

I would like to re-open the ticket because the reason for closing the ticket is wrong. It is correct that HTTP headers should only contain US ASCII headers but wxProtocol::GetLine() is not only used for HTTP headers. It is used for FTP replies as well. And the FTP standard does only specify that after the 3-digit reply code text follows terminated by the telnet end-of-line character sequence.
In the standard RFC 959 I could not find a definition how text is encoded but it is only mentioned that it is intended for human users (see definition of REPLY in section 2.2) and therefore user readable. And also the usage of an end-of-line character sequence determining the end of the text indicates that the encoding of the text is not of importance for the standard.

Unfortunately, it seems to be also the reality that some servers reply with a non-ASCII encoding. I became aware of a French server that replies in French. In this case wxProtocol::GetLine() crashes because of non-ASCII characters (e.g. for é).

I have no idea how to solve this issue because the FTP protocol does not provide information about the encoding. One idea - though it does not solve the problem in general - is to assume that the encoding is UTF-8. It covers the ASCII case and it at least raises the chances to get the correct text than by assuming ASCII only.

comment:6 Changed 6 years ago by hartwigw

Sorry, I just saw that this ticket is related to wxProtocol::GetLine but GetLine has been replaced by ReadLine. Though wxProtocol::ReadLine has the same shortcoming (and I was looking actually at wxProtocol::ReadLine).

comment:7 Changed 6 years ago by vadz

  • Priority changed from normal to low

FTP servers using non-ASCII characters in their response are probably broken but we could indeed use wxConvAuto here to at least try to deal with it. If you can make a (tested) patch implementing this, I'd be glad to apply it.

comment:8 Changed 5 years ago by oneeyeman

Hartwig,
Could you place the site address here for testing?

comment:9 Changed 6 months ago by oneeyeman

I just tried to fix this ticket.
The fix is trivial, but unfortunately I will have to test the fix before submitting it as an official patch/PR.

And I don't know any sites - HTTP, FTP or otherwise that will be using non English characters.

Can someone please give me an address?

Or maybe Hartwig (if he is still around) can reply here and say that he will be able to test - than I will submit the PR and let him perform the testing.

I will try to ask on the forum as well.

TIA!

comment:10 Changed 5 months ago by Vadim Zeitlin <vadim@…>

  • Owner set to Vadim Zeitlin <vadim@…>
  • Resolution set to fixed
  • Status changed from reopened to closed

In b58c5aea8/git-wxWidgets:

Try to accept 8-bit data in wxProtocol::ReadLine() too

Even though the data is supposed to be 7-bit here, we can make an effort
to decode 8-bit data if we get any as it can't be worse than just
throwing it away.

Closes https://github.com/wxWidgets/wxWidgets/pull/1336

Closes #2793.

Note: See TracTickets for help on using tickets.