(keitai-l) PHP and Japanese characters: Looking for wisdom

From: Erick Papadakis <erick.papa_at_gmail.com> Date: 06/01/07 Message-ID: <e9e8f77d0706010614v72e1a4ax1a8a5d0fe6b879f8@mail.gmail.com>

Hi,

Seeing as how this list is aflutter with tech savvy folk, I hope
someone can shed some light on this problem.

We're developing something in Japanese that needs input from a
Javascript "escaped" string. Javascript is unfortunately a must
because the text comes from client side using a bookmarklet. (If it
could be a regular POST or GET, then there'd be no issues).

My problem is that Japan seems to have had a devil of a time getting
to standardize its character sets! Some big sites like isize.com use
Shift_JIS, while others such as Goo or Mixi use EUC-JP, while several
of the more modern ones (such as blogs) use UTF-8.

When we capture the TITLE (document.title) from these websites, and
then "rawurldecode" the received text in PHP, the string comes up
jumbled. If we knew the standard character set before hand, we could
have used the right mb_convert_encoding and such, but this is now an
issue. We tried using Javascript's "document.defaultCharset" thingie,
but that doesn't work either -- I wonder if that's a deprecated
element of the document object?

Would appreciate any insight into how you have solved the issue of
different in-coming text into programs. The php function
"mb_detect_encoding" is totally useless. Given a string, it always
seems to return utf-8.

Many thanks in advance!

.ep