On Fri, 13 Jan 2006, Nick May wrote:
> But the fact remains they would get the benefits noted in my last post
> if they ran it on eucjp.
Sorry to be rude, but benefits you mentioned in your last post are
complete rubbish.
Let's have a look at slashdot.co.jp's top page and an article page with
150+ comments, compressed and uncompressed, in various encodings.
compressed uncompressed ratio uncompressed_name
21808 142018 84.6% comments.utf-8.html
20226 130989 84.5% comments.euc-jp.html
20359 130989 84.4% comments.sjis.html
15648 61434 74.5% top.utf-8.html
14616 56637 74.1% top.euc-jp.html
14632 56637 74.1% top.sjis.html
This size is for the HTML file alone, and does not include the style
sheets or images.
Uncompressed, EUC-JP and Shift-JIS are about 8% smaller than UTF-8.
compressed, about 7-8% smaller. For the compressed pages, you have
to send 10 packets rather than 9, which in a typical TCP connection
will increase download time by perhaps 3-4% (it's the latency for the
connection setup and request/response turnaround that eats a lot of time
in requests this size).
And this is buying you not just avoidance of pain in situations where
you have to interoperate with non-Japanese stuff, but is also, in fact,
improving your Japanese support: UTF-8 lets you encode some Japanese
characters that cannot be encoded in Shift-JIS or EUC-JP, yet there is
not a single character encodable in Shift-JIS or EUC-JP that cannot be
encoded in UTF-8.
But if you're really that intent on shinking your Asian text, just
use UCS-2 or UTF-16 and SCSU (Unicode Technical Standard #6 - A
Standard Compression Scheme for Unicode) and you'll find that both your
straight-Japanese and your straight-ASCII files, as well as almost
all of your files in between, are smaller than their EUC-JP or their
Shift-JIS equivalants.
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974
*** Contribute to the Keitai Developers' Wiki! ***
*** http://www.keitai-dev.net/ ***
Received on Fri Jan 13 08:21:55 2006