Wait when I look at most SJIS content in a hex editor, each Japanese character is 2 bytes long
when its a 2-byte character. Isn't that the same for UTF-8. Sometimes UTF-8 adds an identifier
at the beginning of the file to indicate its UTF-8 (depending on a lot of stuff).
So where is the file size difference. What are these 3 byte cases? Just curious.
Strange thing is I always thought UTF-8 Japanese characters were stored as 2 bytes
but in my hex editor when I just made a file on Windows the UTF-8 version had 3 byte Japanese
characters.
According to this UTF-8 is 3 bytes and SJIS is 2 bytes... am I right? Why have I always thought UTF-8 was
2 bytes? I think I'm going crazy. When I encode UTF-8 I could have sworn when I look at the file in a hex editor
each character was always 2 bytes!
I recall reading somewhere about multiple ways of encoding characters in Unicode... I think in the unicode
documentation... like there are 2 ways to do ga.... one is GA itself and one is KA".
Nick May wrote:
> On 10 Jan 2006, keitai-l-bounce@appelsiini.net wrote:
>
> >> would prefer to use UTF-8 for
> >> encoding the Japanese content
>
> Note that this encoding of JP is - literally - 50% more expensive in
> terms of packets your customers must download.
>
> So it is slower and costs more, and less fits in the cache of older
> phones (10k of sjis content != 10k of UTF8 content.)
>
> "2 byte good, 3 byte bad. Baaaa!"
>
> Nick
>
> This mail was sent to address paul@thetamusic.com
> Need archives? How to unsubscribe? http://www.appelsiini.net/keitai-l/
--
*+*=*+*=*+*=*+*=*+*=*+*=*+*=*+*=*+*=*+*F=m(dv/dt)
Paul B. Lester
thetamusic.com(有)
Chief Engineer
EMAIL: paul@thetamusic.com
--
http://www.thetamusic.com/
personal homepage: http://www.purplepaul.com/
personal EMAIL: pbl1@cornell.edu
*+*=*+*=*+*=*+*=*+*=*+*=*+*=*+*=*+*=*+*F=m(dv/dt)
Received on Tue Jan 10 05:34:43 2006