On Thursday, June 20, 2002, at 02:01 , Curt Sampson wrote:
> On Tue, 18 Jun 2002, john yee wrote:
>
>> I read a bit about Chinese in the full unicode spec (I don't remember
>> the term...). Apparently it doesn't support entire language, just
>> something like 10k+ (characters? brush strokes?).
>
> Last I checked, Unicode supported over 20,000 kanji (though a few
> of those are non-Chinese kanji), and another 6,500 were slated for
> inclusion in the next revision of the standard.
The Chinese call them Hanzi (and in Mandarin, it sounds a bit more like
hanzu or hanze), not that this matters for the number of Chinese
characters, though.
> This is far from the full number of characters that have ever been
> used (50-70,000 comes to mind as an estimate I've heard),
I was taught a number of about 50000 Hanzi, stemming from a dictionary
in which Chinese scholars had aimed to document the evolution of
characters and list any character ever in use. From that I have always
assumed that this meant that a large number of those characters would
have been "previous versions" of characters still in use today, rather
than abandoned "stand-alone" characters. This would mean that the actual
number of characters (as opposed to derivatives) is far lower. In any
event it means that there is no need to have all of them encoded in the
Unicode standard, unless of course some Chinese scholars want to
recreate an electronic version of the dictionary which aims to list
every form of Hanzi there has ever been. In which case it would probably
make more sense to simply list the non-Unicoded ones as graphics, not as
fonts.
> and new ones are being invented all the time.
I don't think there are that many new characters being invented. In fact
writing reforms in China and Japan has aimed to cut down on the number
of characters. That's why you have simplified Chinese (used in Mainland
China) and traditional Chinese (used in Hong Kong and Taiwan) character
sets, which ironically, from a Unicode standard point to view has
increased the number of characters that need to be encoded.
For example the character for spirit "Ki" has three different
representations, one simplified, one traditional and a Japanese version.
From a Unicode point of view those are three different characters, but
they are actually one:
Traditional Chinese "Qi" : 氣
Japanese "Ki" : 気
Simplified Chinese "qi" : 气
(PS: You need all thee coding systems and fonts installed to see the
characters)
> But many of these are used rarely or not at all. For most circumstances,
> 10,000 kanji is adequate.
Absolutely, for most circumstances 4000-5000 Kanji is adequate.
Simplified Chinese will require fewer, Traditional Chinese will require
more, but fewer than 10000 will be adequate. I guess that the number
20000 plus the 6500 additionals you quoted stems from replication
because there are three writing systems (or actually four as the Koreans
also use them alongside Hangul).
In any event, Unicode is definitely good news for everyone who has to
deal with non-Roman writing systems. Well, that is if vendors actually
make use of it, some seem fearful ...
http://www.theregister.co.uk/content/39/25742.html
regards
benjamin
Received on Thu Jun 20 10:42:43 2002