(keitai-l) Re: Supported Character Sets for I-mode

From: Nick May <nick_at_kyushu.com>
Date: 01/10/06
Message-Id: <31C51E36-7278-4799-B2C8-8AD69F471DFC@kyushu.com>
On 10 Jan 2006, at 12:46, Paul Lester wrote:

>     According to this UTF-8 is 3 bytes and SJIS is 2 bytes... am I  
> right?

As I understand it UTF-8 is variable width, 1 to 4 bytes, with the 1  
byte required for the lower 128 US ascii. But kanji and similar  
scripts take 3.


 From Wikipedia http://en.wikipedia.org/wiki/UTF-8

UTF-8 is generally larger than the appropriate legacy encoding for  
everything except diacritic-free, Latin-alphabet text. Most  
alphabetic scripts had only a single byte per character in legacy  
encodings but their letters take at least two bytes in UTF-8.  
Ideographic scripts generally had two bytes per character in their  
legacy encodings yet take three bytes per character in UTF-8.

Nick
Received on Tue Jan 10 06:30:29 2006