Re: ld.so, threads

From: Andrew Valencia <vandys_at_nospam.org>
Date: Sun Sep 25 1994 - 18:26:10 PDT

------- Blind-Carbon-Copy

To: gtn@ebt.com (Gavin Nicol)
Subject: Re: ld.so, threads
In-reply-to: Your message of "Sun, 25 Sep 1994 18:38:26 +0500."
             <9409252238.AA00492@ebt-inc.ebt.com>
Date: Sun, 25 Sep 1994 18:26:10 -0700
From: Andrew Valencia <vandys@cisco.com>

[gtn@ebt.com (Gavin Nicol) writes:]

>Andy, what is the general technique used fir the shared libraries? We
>don't need specifics, but I for one would like to know whether it's
>jump tables, PIC or something else.

Ok, here's a brief summary of what I'm coding up. Please send comments and
discussion to vsta-shlib@cisco.com, not the main list!

Each library to be shared has a <lib>.db file which tabulates which globals
should be visible to library clients, and which should be hidden. By
splitting this into its own file, it's easy to spot if an existing entry
point is "lost", or spot something new which you forgot to place in the .db.
This file is a simple text file which you edit, and is maintained under RCS.

All the .o's of a library are put into <lib>_s.a. This is the static
version, and would be used by boot servers and others who would need to run
before filesystems and name servers are available.

All the .o's are also built, using -r, into <lib>.tmp. This is a single
object file, but is still relocatable (because of -r). This is the file
which is processed to create the sharable version.

The utility "mkshlib" is invoked with a list of .db's corresponding to the
libraries which are to be shared. For each library, it reads the .db and
then reads the .tmp, checking for extra or missing symbols. For each entry,
it generates a .s file in /tmp which essentially says:

0: load index of this entry point
        load base of jump table
        if non-zero
            call via jump table indexed by entry index
        else
            call loader
            goto 0, and try again

These .s's are all assembled and placed into <lib>.a, along with an extra .o
at the end which provides the "loader" function.

The "loader" is done in two stages. The code in each shared library knows
just enough to load the main code for dealing which mapping in a library.
This is so new functionality can be added to the lookup/load phase in the
future (load paths, different library formats, whatever).

The real "loader" initially will just look up the library name in the root
filesystem, read in its header to get the address it was relocated for, and
then mmap() the file at that address. To be more exact, it mmap()'s the
text read only, the data copy-on-write, and mmap()'s zero-fill-on-demand for
the BSS.

An extra .o is added to <lib>.tmp which provides a jump table out to each
entry point in the code.

Shortcomings... no data is made available from the C library, nor can the C
library make use of any data from user code. I fiddled things like stdio's
__iob[] array (C startup calls __get_iob() to get the address now) and
__ctab[] for <ctype.h> (same trick). getopt() had similar variables, which
were replaced with macros much like errno's (check the source).

Strengths... except for the initial pause while the library is mapped in,
calling speed is slowed down only by the cost of one extra table lookup and
one extra jump. C library code is fully optimized, so it'll run as fast as
ever. The C library memory is shared among all clients. The library setup
should be pretty quick, since the code is already relocated.

The .db file should prevent unexpected missing entry points from crashing
your application at an inconvenient time.

Like all shared libraries, you can update all clients by just updating the
library. Unlike some, a client can't override a function in the C library.
Well, he can for himself, but not for the library code.

Support for multiple versions of libraries is possible, since the loader is
also picked up dynamically from the filesystem during shared library map-in.
I won't support this on first release; usually it ends up being a big can of
worms, rather than much help.

Since each entry is burst into its own .o in the <lib>.a, it doesn't matter
that you placed a bunch of functions into a single .o in your C library
source. The granularity on the client side is always per-function.

                                        There you have it!
                                        Andy

------- End of Blind-Carbon-Copy
Received on Sun Sep 25 17:14:16 1994

This archive was generated by hypermail 2.1.8 : Thu Sep 22 2005 - 15:11:46 PDT