Opened 14 years ago

Closed 12 years ago

#3874 closed defect (fixed)

wxURI doesn't support Unicode

Reported by: hzd_byte Owned by:
Priority: high Milestone: 2.9.0
Component: wxHtml Version: stable-latest
Keywords: Cc: hzd_byte, vaclavslavik, ryannpcs
Blocked By: Blocking:
Patch: no

Description

With single-byte charset all ok, but when I use unicode, there are problems..
For example, with files shown with IMG tag in wxHtmlWindow... :(

Change History (12)

comment:1 Changed 14 years ago by ryannpcs

What does this have to do with wxURI? It isnt in the HTML lib either....

comment:2 Changed 13 years ago by hzd_byte

Anyway, it must be fixed :/
I have a problem using wxHtmlWindow & wxLaunchDefaultBrowser those use wxURI internally.

comment:3 Changed 13 years ago by ryannpcs

I'm sorry but you're really going to have to be more specific and/or provide a test case

Anyway, it must be fixed :/

WHAT must be fixed? You've said it "Doesn't support unicode" but it has for quite some time, so I'm assuming you'll need to be at describe your problem

I have a problem using wxHtmlWindow & wxLaunchDefaultBrowser those use wxURI internally.

What's the problem?!??

Anyway, if you don't provide more info this will likely be closed or forgetten.

comment:4 Changed 13 years ago by hzd_byte

I pass unicode text (with different unicode chars) in wxHtmlWindow (SetPage method, for example).
And this text has unicode chars (for example, russian letters) for IMG SRC parameter...
wxHtmlWindow internally uses wxURI and wxURI tries to escape unicode string with "%xx" as single-byte encoding text.
After this, unicode string is corrupted... path to file is broken :(

comment:5 Changed 13 years ago by ryannpcs

Yep, you're right - turns out
wxWidgets version of wxURI::Escape is probably
the culprit in that it assumes single byte - maybe
use the 4-hex percent encoding for unicode or
count the number of bytes in a wxChar...

comment:6 Changed 13 years ago by hzd_byte

I think use of sizeof(wxChar) is a better solution.

comment:7 Changed 13 years ago by vadz

Could we have an example of what exactly doesn't work? I'm still rather confused. E.g. how to reproduce this bug in any of html/xxx samples?

Thanks!

comment:8 Changed 13 years ago by hzd_byte

Try to use international letters in "a href" attribute's value (for example, russian "вот эта хуйня не пашет") or initialize wxURI with such string. wxURI::Escape breaks original value.

comment:9 Changed 12 years ago by wxsite

  • Status changed from assigned to confirmed

transitioning old 'assigned' status to new 'confirmed' status

comment:10 Changed 12 years ago by vadz

  • Cc vadz removed
  • Status changed from confirmed to infoneeded_new

How do you (hzd_byte) expect this to work? From my reading of RFC 2396 there is no standard way to handle Unicode characters in URIs and I can see 2 alternatives:

  1. Encode URI in UTF-8
  2. Use %uxxxx JavaScript-like encoding

Neither of them is really perfect however, so it'd be really useful to know how do people expect to use this?

comment:11 Changed 12 years ago by byte

  • Milestone set to 2.9.0
  • Owner vaclavslavik deleted
  • Status changed from infoneeded_new to new
  • Version set to 2.9-svn

vadz, IE and Firefox represent unicode characters as UTF-8:
http://ru.wikipedia.org/wiki/%D0%A6%D0%B5%D0%BB%D0%BE%D0%B5_%D1%87%D0%B8%D1%81%D0%BB%D0%BE
this is russian "http://ru.wikipedia.org/wiki/Целое_число". I think we must use UTF-8 too.

comment:12 Changed 12 years ago by vadz

  • Resolution set to fixed
  • Status changed from new to closed

I made significant changes to wxURI in r54723 and I believe they should fix your problem, please let me know if you still have any.

Unfortunately it's probably impossible to fix this in any reasonable way in 2.8, so this is for trunk only.

Note: See TracTickets for help on using tickets.