(keitai-l) Re: Supported Character Sets for I-mode

From: Paul Lester <paul_at_thetamusic.com> Date: 01/10/06 Message-ID: <43C32E02.5EE578AA@thetamusic.com>

    Wait when I look at most SJIS content in a hex editor, each Japanese character is 2 bytes long
when its a 2-byte character.  Isn't that the same for UTF-8.  Sometimes UTF-8 adds an identifier
at the beginning of the file to indicate its UTF-8 (depending on a lot of stuff).

    So where is the file size difference.  What are these 3 byte cases?  Just curious.

    Strange thing is I always thought UTF-8 Japanese characters were stored as 2 bytes
but in my hex editor when I just made a file on Windows the UTF-8 version had 3 byte Japanese
characters.

    According to this UTF-8 is 3 bytes and SJIS is 2 bytes... am I right?  Why have I always thought UTF-8 was
2 bytes?  I think I'm going crazy.  When I encode UTF-8 I could have sworn when I look at the file in a hex editor
each character was always 2 bytes!

    I recall reading somewhere about multiple ways of encoding characters in Unicode... I think in the unicode
documentation... like there are 2 ways to do ga.... one is GA itself and one is KA".

Nick May wrote:

> On 10 Jan 2006,  keitai-l-bounce@appelsiini.net wrote:
>
> >> would prefer to use UTF-8 for
> >> encoding the Japanese content
>
> Note that this encoding of JP is - literally - 50% more expensive in
> terms of packets your customers must download.
>
> So it is slower and costs more, and less fits in the cache of older
> phones (10k of sjis content != 10k of UTF8 content.)
>
> "2 byte good, 3 byte bad. Baaaa!"
>
> Nick
>
> This mail was sent to address paul@thetamusic.com
> Need archives? How to unsubscribe? http://www.appelsiini.net/keitai-l/

--
*+*=*+*=*+*=*+*=*+*=*+*=*+*=*+*=*+*=*+*F=m(dv/dt)
Paul B. Lester
thetamusic.com（有）
Chief Engineer

EMAIL: paul@thetamusic.com
--
http://www.thetamusic.com/

personal homepage: http://www.purplepaul.com/
personal EMAIL: pbl1@cornell.edu
*+*=*+*=*+*=*+*=*+*=*+*=*+*=*+*=*+*=*+*F=m(dv/dt)