At Mon, 17 Jan 2005 14:57:59 +0900 (JST), Curt Sampson wrote:
>
> This is not true, because sorts based on the numerical representation of
> a kana can't give tokuon a lower precedence than kana following the kana
> with tokuon. For example,「じゃきょう」 sorts before 「しゃく」in my
> dictionary, but with a sort based on character codes, じ (0x3058) comes
> after し (0x3057), and so じゃきょう would sort after even 「しんぬ」.
Oops, sorry, don't mind me I was asleep when I replied :(
I think for hiragana only your algorithm works. Including kanji,
katakana and romaji the JIS standard includes 5 collation levels - you
can see an open source implementation of the full collation in Perl's
Lingua::JA::Sort::JIS:
http://search.cpan.org/~sadahiro/Lingua-JA-Sort-JIS-0.04/JIS.pm
--
Alex
Received on Mon Jan 17 09:40:51 2005