While real-time attributes are discussed later, it is worth noting that the job of writing code to operate correctly in the presence of parallel CPUs has some synergy with the goal of allowing kernel preemption. In both cases the programmer must always consider the possibility of the same code paths being entered and reentered at arbitrary points. Kernel preemption imposes additional demands, especially in code paths like exit().
With the smaller kernels achieved by moving so much functionality out to processes, the remaining code is much easier to design, code, and test. Since writing parallel, preemptive code is harder to write than traditional single-threaded code,(4 ) the reduction in source size is desirable.
VSTa's machine-independent layers were written for a shared memory symmetric multiprocessor. A P/V semaphore interface is used for sleep-oriented interlocks. A P/V spinlock is used for spin-oriented interlocks, and also used to interlock against interrupt service procedures. The machine-dependent code, sadly, is only written for a uniprocessor i386--the only machine available to the author.
VSTa was designed with memory locking and real-time priorities. Except when a spinlock is held, a thread is preemptable even when running in kernel mode. Most spin-locks do not involve interrupt-driven code; for these, interrupts are still accepted and queued even while the spinlock is held--preemption to a real-time process is delayed until the spinlock is released.
This organization has two desirable properties. First, it allows users and groups of users to partition the CPU resources fairly among groups based on local policy. With the ratio of CPUs to users approaching 1:1, the classic departmental computing scenario may never arise. But it can be convenient to guarantee that some particular server will never consume more than half the CPU time (unless it would otherwise be idle.)
Such a scheduler also provides many of the properties of a gang scheduler. When the classic UNIX scheduling algo- rithm is used to run closely cooperating processes, its global nature allows any runnable processes to compete directly with the threads. Since all threads under VSTa exist under a common scheduling node, the threads can voluntarily relinquish the CPU; the CPU time relinquished remains within the "pool" of the node, so only other related threads under the node will complete for it. (5 )
VSTa structures the exchange of messages between client and server. A would-be client requests connection; a connection indication including the would-be client's capabilities is presented to the server. The server accepts or rejects the connection. If accepted, the client can then send messages. Each element in the scatter/gather list of the message is made available in the server's address space when the server receives the message. The contents are mapped on demand as the server makes reference to the data. Alternatively, the server can merely pass the message on without touching its contents (for instance, a middle module in a protocol might add a new initial buffer without needing to examine the contents being encapsulated.) The data was never mapped or read; the passing of data in these cases is thus quite efficient.
If the server returns data it is copied into the client's address space before the client's request is completed. It would be desirable to use the same "lazy" semantics for message mapping as the server, but this would make it difficult for the server to know when the client is done using the data. Techniques involving the "handing off" of pages of data are possible, but many servers would then have to copy data into new pages; any performance benefits can easily be lost.
VSTa borrows much from each operating system. Like Plan 9, all servers provide their services in terms of a filesystem-like interface. The format of the messages sent through the microkernel is standard and implements a filesystem protocol similar to Plan 9's 9P protocol. ( 7 ) VSTa use a string table approach like QNX, although the table is private to each process. In fact, the table and its interfaces are all entirely with the C library; neither the VSTa kernel nor its servers have any control over a process' mount table.
In VSTa, an ability is represented as a dot-separated sequence of numbers, called an ID. The numbers become more specific reading from left to right. An object in VSTa (8 ) has a label with such a label, and for each position, another bitmask indicating what actions are permitted (It also has a default access, which is just OR'ed in with any other bits granted.) For instance, assuming no default:
1. 2. 1. 3
EXEC READ WRITE DELETE
Would indicate that someone possessing 5.3 could not access the object; someone with 1.5 could only execute it (a mismatch discontinues the accumulation). A hierarchy of "super users" is gained with the last rule: someone with 1.2 would gain read, write, excute, and delete abilities! When someone possesses an ID which matches a label to the length of the ID, the ID is said to dominate the label. The remaining bits are OR'ed in as if the match continued to the end of the label. The superuser of a VSTa system, therefore, is someone who has an ID of length 0.
One can forge a new ID; it is permitted if at least one of the current IDs dominates the new ID to be forged. Thus, someone who logs in with the ability 5.7 could store all sensitive data with a label of, say, 5.7.1 with all access requiring a full match:
5. 7. 1
(0) (0) READ|WRITE|EXECUTE|DELETE
If this same user then wanted to run a somewhat suspect application, he could forge a new ID of 5.7.2, disable his current IDs, and run the application. Since the application does not possess an ID which allows access to 5.7.1, the user's data is protected. Because such ID manipulation can be done by any user, fine-grained protection designs are possible in a way which UNIX forbids without extensive use of super-user powers.
The system boots and runs on top of either a flat contiguous filesystem or a DOS filesystem. Further device drivers and filesystems can be started and stopped from the command interpreter. The system does demand-paging of executables and does page-stealing when memory becomes scarce using a two-handed clock algorithm.
At the application level, a C library and include files have been written to conform to the POSIX specification. GNU C, as, and ld have been ported and run natively under VSTa. Emacs and other amenities are also available. As each port identifies a missing area, it is coded up based on the POSIX standard. Thus, the system deviates from POSIX more by omission than otherwise. The exception is the area of protection, where this incompatibility was foreseen and accepted at the conception of the project.
Some servers, for the sake of simplicity, do not take full advantage of the scatter/gather functions. At least the disk drivers and filesystems should be carefully optimized to make best use of scatter/gather lists.
theirbox:1.* -> mybox:99.*
trustedbox:* -> mybox:*
*:* -> REJECT
This would allow both access control (you must have 1.* on theirbox to log in; trustedbox has the same accounts as us) and dynamic translation of IDs between systems.
The name server is currently a local entity. However, once remote service access is available, it is a simple matter to import other node's name servers and use them locally. Because VSTa supports Plan 9-style union directories, you could even mount each name server at the same point in your local filesystem name space, with your local name server coming first. Ultimately, a network-aware database system must be implemented, but it is interesting to ponder how far these simple and powerful Plan 9 techniques can take one.