hi Ben and darren, thanks for your notes. but what can
i do then to enable the following. i need to be able
to take text from different japanese data sources
(mostly shift_jis, but also ascii, EUC and
ISO-2002-JP) and somehow merge them into one file.
consider it to be a data integration module in my app.
how can i do this safely? would it be safe to assume
that shift_jis support is enough for japan market?
thanks/erick
--- Ben Hutchings <ben.hutchings@roundpoint.com>
wrote:
>
> On Fri, 31 May 2002, Darren Cook wrote:
>
> >
> > > i am building a module which will allow users to
> merge
> > > data from an ASCII source and a SHIFT-JIS
> source. does
> > > it make logical sense to have all of the data
> files
> > > (such as ASCII and Shift-JIS) in the same
> encoding
> > > .... in other words, is ASCII a subset of
> > > SHIFT-JIS, or vice versa?
>
> I'll assume that you mean US-ASCII, as there are
> variants of this
> specified in ISO 646 that are also sometimes called
> ASCII.
>
> > Yes, 7-bit ASCII is a subset of Shift-JIS, so all
> your data files can be
> > in shift-jis encoding.
>
> No it isn't! In US-ASCII, backslash has the code
> 0x5C, but in Shift-JIS
> this code is used for the yen symbol.
>
> What the item on the Python list was saying was that
> the second byte of a
> two-byte character may take values that can also
> represent a character on
> their own; for example, 0x5C is also valid as a
> second byte. This means
> that searching for characters in Shift-JIS strings
> requires awareness of
> the multi-byte encoding; for example, in C, strchr()
> and strstr() will not
> work correctly on Shift-JIS strings.
>
>
> This mail was sent to address
> erick_papadakis@yahoo.com
> Need archives? How to unsubscribe?
> http://www.appelsiini.net/keitai-l/
>
__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com
Received on Fri May 31 17:37:58 2002