Character encoding

From: Gavin Thomas Nicol <nick_at_nospam.org>
Date: Thu Oct 07 1993 - 00:55:50 PDT

>> Does anyone have any comments on character encoding in VSTa. I think
>> it is easy to say that fixed width characters (like 16bit) are out
>> because they'll cause everything to grow too much. That leaves us with
>
>Sorry, I'm no expert, but the idea of a multibyte representation really
>turns me off. Surely a uniform 16 bit character set is easier to deal with
>than a multibyte one (in terms of programming)? Ok, so it might be less
>efficient in terms of space, but I think you'd pay for that in terms of
>code complexity with the alternative...

No, a straight move to 16 bit characters means than everything that
deals with characters (and that means almost everything), and *all*
files (including executables), will grow *unless* you introduce
attributes, which is a truly horrible idea (how does "cat" work in
such a case?). Also, moving to straight 16bit will break a lot of
programs, whereas many programs will work with multibyte with few
changes because they are 8 bit clean (an example is microemacs. The
version distributed with VSTa work with kanji on my Japanese version
of VSTa).

The way plan9 does it is to *store* and *send* data in multibyte
format, but internally, use 16bit characters. This has the advantage
of not breaking many programs, while at the same time allowing program
that are modified to handle 16 bit codes, to take advantage of them.

They have some library functions for converting a string between unes
and multibyte, and str* like functions for manipulating 18 bit
characters.

>And if we did use a 16 bit character set, Unicode is the obvious choice...

I agree it's the *obvious* choice, but is it the best? I don't know...
Received on Thu Oct 7 01:02:35 1993

This archive was generated by hypermail 2.1.8 : Wed Sep 21 2005 - 19:37:12 PDT