Re: Runes

From: Peter Holzer <hp_at_nospam.org>
Date: Tue Mar 01 1994 - 04:26:21 PST

You (Gavin Thomas Nicol) wrote:
>
> >The only thing I expect from a locale is a way to change collating
> >sequences. This is important since accented characters are not in a
> >useful order for most (all?) languages. Lowercase/uppercase-mappings
> >are probably not locale-dependend. I don't care much for
> >currency-signs, decimal point/comma and similar things.
>
> This is the sticky part. The tables required for handling case
> conversion and the isascii(), isnumber() etc. macros take up 3-400k.
> Ideally, for each language, a collating table is also needed, and this
> will add even more overhead.

4.4 BSD uses files to store information for the is*() macros. Each line
describes a block of characters, so these files are usually small.
I don't know what they do about collating tables. These are really
tricky, since there is often no one-to-one mapping. For example, in
German Umlaut-A used to be treated just like A, except if two words were
completely identical except for the umlaut, the one with the umlaut came
after the one without umlaut (Fortunately they have changed the rule.
umlaut-A is now a letter between A and B).

> I am focusing on just the basics for now: I/O, conversion to/from
> multibyte and runes, and will worry about this later. I think the
> hardest part is going to be regexp()... though filename globbing will
> be easy.

This will probably be enough for now. The trickier parts can be added as
needed.

        hp

-- 
   _  | hp@vmars.tuwien.ac.at | Peter Holzer | TU Vienna | CS/Real-Time Systems
|_|_) |------------------------------------------------------------------------
| |   | It's not what we don't know that gets us into trouble, it's
__/   | what we know that ain't so. -- Will Rogers
Received on Tue Mar 1 04:53:52 1994

This archive was generated by hypermail 2.1.8 : Wed Sep 21 2005 - 21:02:16 PDT