On 13 Jan 2006, at 16:46, Curt Sampson wrote:
> I should mention, as I forgot to earlier, that these were compressed
> with gzip at the default compression level. Just for a quick
> comparison,
> if anybody's curious, bzip2 -9 gives:
Ah good!
So - a workaround to reduce the "UTF-8 tax" is by using a slow and
resource intensive compression scheme like bzip2 -9 compression. That
MAY be appropriate in some situations - but - note - is getting us
into a tradeoff with cpu-burden at the SERVER end. Very relevant if
one's server is already stretched.
Is there a mod_bzip yet? Or does one have to do it in one's output
layer.... I note that bzip support has to be compiled in to php
especially.
> I think that this pretty much explodes any arguments about UTF-8
> versus
> EUC-JP if your main concern is data size; what do you do in terms of
> compression makes much, much more difference
Actually, what your figures suggest is that if you wish to avoid the
7 to 8% UTF-8 tax and cut it so something smaller, you HAVE TO use a
cpu intensive compression like bzip2 -9 rather than the standard
gzip. (Which rules out all those older browser which can't handle .bz
files.
Worth knowing, certainly - but hardly a glowing endorsement of UTF-8.
Nick
Received on Fri Jan 13 10:20:46 2006