On 13 Jan 2006, at 10:50, Alex Shinn wrote:
> High-bandwidth sites like Slashdot use mod_gzip, compressing on the
> fly.
Yes sure - as do lots of sites. Or something similar. I believe I
mentioned compression.
> Considering the verbosity of HTML this is a win no matter what
> encoding you use
Indeed. And of course markup dilutes content.
> , and basically eliminates any size differences in the
> encodings.
Are you claiming that 3 byte UTF-8 is SO much more compressible than
2 byte eucjp that it is sufficient to make up the difference? That
would indeed be interesting and would remove a major issue with UTF-8.
Or are you referring to non-zero, non-trivial values of "basically
eliminates"? In which case you are playing with words rather than
addressing the point.
In fact, the less one uses plain old html, and the more one moves to
stylesheets for layout, the greater the percentage of a given page
served tends to be content (for all but the first page, when the
stylesheet is served and cached), rather INCREASING the hit from
using 3byte UTF-8 over 2byte EUC-JP.
It is one thing to assign different values to the various elements in
a tradeoff, (bandwidth, peak capacity, page load times, encoding
etc ) but quite another to deny that those elements in the tradeoff
exist at all. Which is what you seem to be doing. (But then the
essence of your claim lurks somewhere beneath the murky semantic
surface of the word "basically"!)
>
> In fact, http://slashdot.jp/ uses UTF-8 as its encoding.
Sure. I looked before I posted. More fool them, unless they have a
good reason to**. I am interested in what it is rational to do, not
what this or that site actually does. (In addition, they are not a
terribly high bandwidth site, so bandwidth is far less of an issue
for them. But the fact remains they would get the benefits noted in
my last post if they ran it on eucjp.)
What I WAS referring to in my post (this could have been clearer, I
grant) was a site with the vast bandwidth requirements of OUR
slashdot - slashdot.org, but serving Japanese.
Incidentally - on the subject of "making changes to HTML", there were
some figures worked out for how much Slashdot.org had saved itself in
a year by going from its old format to its new css stylesheets. I
can't remember them off hand, but it was quite a lot of money. All by
changing their nice compressible, gzipped text.
Of course one should select encodings on a rational basis and choose
one that is appropriate to the domain. That may well be UTF-8 - even
fat boys get dates. But for certain types of constraints (and
bottlenecks) within certain domain, it is probably rational to select
a 2 byte encoding over a 3 byte one. Ultra high volume sites serving
mainly text and using stylesheets, being a case in point.
UTF-8 may well have many advantages over euc-jp and sjis. But its
proponents do themselves, and it, a disservice to pretend that moving
to it does not involve trade-offs.
Nick
** I can think of several good reasons why they might want to, but
these are all trade-offs against the saving in bandwidth that eucjp
would buy them.
Received on Fri Jan 13 06:25:49 2006