Hi,
Paul Lester wrote:
> [...]
> 2. The encoding in email is for all intensive purposes sjis except it is called
> something else: ISO-2022-JP.
One of us is confused. I went through a conversion jungle for japanese
about a month ago and I'm pretty sure that ISO-2022-JP is the "real" JIS
while Shift_JIS is something really strange that M$ has invented
(seemingly to confuse everyone else). I certainly located a program to
convert between the two and it seems nontrivial.
Of course, I don't read any of it so I could be wrong. Could someone
clarify?
/ Jonas
Code snippet (php):
function SJIStoJIS(&$str_SJIS)
{
$str_JIS = '';
$mode = 0;
$b = unpack('C*', $str_SJIS);
$n = count($b);
//Escape sequence
$ESC = array(chr(0x1B).chr(0x28).chr(0x42),
chr(0x1B).chr(0x24).chr(0x42),
chr(0x1B).chr(0x28).chr(0x49));
for ($i = 1; $i <= $n; ++$i) {
$b1 = $b[$i];
if (0xA1 <= $b1 && $b1 <= 0xDF) {
if ($mode != 2) {
$mode = 2;
$str_JIS .= $ESC[$mode];
}
$str_JIS .= chr($b1 - 0x80);
} elseif ($b1 >= 0x80) {
if ($mode != 1) {
$mode = 1;
$str_JIS .= $ESC[$mode];
}
$b2 = $b[++$i];
$b1 <<= 1;
if ($b2 < 0x9F) {
if ($b1 < 0x13F) $b1 -= 0xE1;
else $b1 -= 0x61;
if ($b2 > 0x7E) $b2 -= 0x20;
else $b2 -= 0x1F;
} else {
if ($b1 < 0x13F) $b1 -= 0xE0;
else $b1 -= 0x60;
$b2 -= 0x7E;
}
$str_JIS .= chr($b1).chr($b2);
} else {
if ($mode != 0) {
$mode = 0;
$str_JIS .= $ESC[$mode];
}
$str_JIS .= chr($b1);
}
}
if ($mode != 0) $str_JIS .= $ESC[0];
return $str_JIS;
}
--
Jonas Petersson | XMS Penvision | mailto:Jonas.Petersson@xms.se
Box 3294, Västgötegatan 13, S-600 03 Norrköping | http://www.xms.se/
Tel: +46 11 400 13 00 | Dir: +46 11 400 13 05 | Fax: +46 11 10 30 50
Received on Wed Sep 22 09:31:42 2004