(keitai-l) Re: PHP and Japanese characters: Looking for wisdom

From: Christopher Kobayashi <chriskk_at_gmail.com>
Date: 06/02/07
Message-ID: <cd896f680706011929g79eff33i5e60e2ad65d31090@mail.gmail.com>
Interesting ...
Not sure if this is possible, just throwing out ideas.

How about grabbing the meta element's encoding info using JavaScript.
Maybe something like :
document.getElementsByTagName('meta')

Googling around, bumped into this
http://www.thescripts.com/forum/thread151865.html

Not all sites declare their encoding using the meta element, but at
least it's a shot. Once you scrape the encoding info, shoot that to
your PHP script before the Japanese text string.

chriskk


On 6/1/07, Erick Papadakis <erick.papa@gmail.com> wrote:
> Hi,
>
> Seeing as how this list is aflutter with tech savvy folk, I hope
> someone can shed some light on this problem.
>
> We're developing something in Japanese that needs input from a
> Javascript "escaped" string. Javascript is unfortunately a must
> because the text comes from client side using a bookmarklet. (If it
> could be a regular POST or GET, then there'd be no issues).
>
> My problem is that Japan seems to have had a devil of a time getting
> to standardize its character sets! Some big sites like isize.com use
> Shift_JIS, while others such as Goo or Mixi use EUC-JP, while several
> of the more modern ones (such as blogs) use UTF-8.
>
> When we capture the TITLE (document.title) from these websites, and
> then "rawurldecode" the received text in PHP, the string comes up
> jumbled. If we knew the standard character set before hand, we could
> have used the right mb_convert_encoding and such, but this is now an
> issue. We tried using Javascript's "document.defaultCharset" thingie,
> but that doesn't work either -- I wonder if that's a deprecated
> element of the document object?
>
> Would appreciate any insight into how you have solved the issue of
> different in-coming text into programs. The php function
> "mb_detect_encoding" is totally useless. Given a string, it always
> seems to return utf-8.
>
> Many thanks in advance!
>
> .ep
>
> This mail was sent to address chriskk@gmail.com
> Need archives? How to unsubscribe? http://www.appelsiini.net/keitai-l/
>
>
Received on Sat Jun 2 05:29:49 2007