next_inactive up previous


Back to Cryptic Allusion DC Software Projects.

KallistiOS 1.1.x (devel branch of KOS, to be a complete rewrite)

1 Abstract / Notes

The KOS 1.1.x/2.0 branch (also called KOS post-1.0) is to be a complete rewrite of the original KOS, with perhaps some of the same ideas at the userland level. KOS 2 will be a full microkernel design that makes extensive use of MMU support.

THIS IS A TENTATIVE DOCUMENT. This is not so much the exposition of the exact architecture of KOS 1.1.x and beyond, it is just a set of notes as guidelines, and it will be corrected as the kernel design progresses and we find gross mistakes in here or things that need to be implemented differently. There are also a number of things in here that are purposefully vague since we'll have to define them better when we get there. Well, without further ado...

2 Design Overview

The kernel will be organized into two parts: a completely portable part, and a platform dependent part. The portable part will be contained in the kernel directory, while the non-portable part will be contained in arch/<processor>/<platform>. <processor> is something like "sh4", "ppc" or "ia32", while <platform> is like "dreamcast", "powermac" or "pc".

Portable parts include all scheduling and thread switching; process and thread creation, management and destruction semantics; interrupt routing; page table management and memory allocation; kernel-level synchronization (semaphores and events); a path-based server lookup mechanism (device tree); a message passing system; an asynchronous signal system; and a simple built-in file system for bootstrapping.

Platform specific parts include functions to perform any low-level task required of the portable part, such as initialization, non-portable MMU table manipulation, context switching, syscalls, interrupt mask handling, SMP operation, and cache coherency. It will also define the limitations of the platform for the portable part, such as detecting available memory for the page pool. The platform-specific code will also include a basic debug i/o stub, for debugging at the kernel level, and for obtaining pre-server startup messages if neccessary. This basic debug i/o should be switched to a server process as soon as possible, or discarded. Where relevant (e.g., ia32's VM86 mode) the platform specific part can add new syscalls in a controlled manner.

All other kernel functions will be located in the servers directory, which will contain microkernel (MK) server programs to perform them. These include posix-style IPC and pipes, PCI bridge systems, device drivers, file system interpreters, other-os emulation layers (Linux, BSD, Win32), TCP/IP, etc. Some servers are not platform specific, and can be located in servers/generic. Others may be processor or platform dependent, and can be organized as neccessary (servers/dreamcast, etc). Where possible kernel servers should try to adhere to a common interface so that user programs can run unchanged on a different platform. All relevant and portable kernel servers will eventually end up in the /kos/servers directory, and platform-specific binaries will end up in /kos/<platform>/servers.

Note that although the kernel servers are normal programs like the user servers, kernel servers specify functionality that is usually reserved for kernel-space in legacy OSs, and in many cases demands supervisor-level access to some parts of the platform (or even provide callbacks for kernel functionality that isn't implemented in the base kernel in some instances), while user programs will always operate strictly through the interfaces defined by the kernel and the kernel servers. Even given this, though, a kernel server will start like any other user-level program: with access only to its own process space and only the priveledges it inherited from the process that performed the loading.

Non-kernel programs such as shells, will be located in the user directory (analogous to 'userland' in the old KOS). All relevant and portable programs in the user directory will eventually end up in /kos/bin, and all platform-specific binaries will end up in /kos/<platform>/bin.

All userland programs (including MK servers) will be linked with a libc that defines the standard host of ANSI C functions and a number of KOS-specific functions for accessing the kernel functionality. More specifically, the libc will contain ANSI C wrappers for KOS' native functionality. All native functionality that works like POSIX, Unix, etc, will be implemented using the server directory and message passing primitives of the kernel through a platform-specific syscall mechanism. The libc should be derived from the BSD libc where possible.

3 The portable part of the kernel

3.1 Scheduling and thread switching

For the initial rewrite, this module of the kernel will basically be lifted directly from KOS 1.x's thread.c. The platform-specific functionality in thd_create will need to be moved into a platform-specific function. The basic mutex functionality should actually be moved into a platform-specific include file, and put in inline as ``spinlocks'' instead of mutexes (which is really closer to the truth). The initial thread creation in thd_init should be done using thd_create, getting the initialization functionality either through an ``initializing'' flag, or by just damned well designing it right...

Like KOS 1.0, the timer will be called 100 times per second to update the internal clocks and timers, and this will encourage a context switch. Note that any syscall could cause a context switch, however. Incidentally, we are also considering the asynchronous time keeping mechanism that IBM introduced for the newer Linux kernels.

3.2 Process and thread creation, management and destruction

As with previous KOS versions, the thread pool is global to the whole system, even though threads all belong to a process (see below). Scheduling happens without regard to which process a thread belongs to, except perhaps to reduce MMU overhead. In some platforms such as SH-4, the context switch overhead as regards the MMU is negligible so this factor can be removed. This needs to be taken into consideration in the scheduler eventually.

A process tree needs to be created. Each process except the initial process needs to have a parent process, and contains one or more threads. It is not possible (as most Unix organizations would imply) to have a process with no threads. In Unix the idea is to have a process as a thread of execution, and it can in turn create sub-processes that share its space. In KOS this is not true: a process is merely an organizational shell for keeping track of threads and used resources.

It is possible for a process to be a sub-process of another process, but a thread must always belong to a single process. Processes can be re-parented, but threads can never be re-parented (nor does it make any sense for them to be).

A process is defined, then, as a set of threads, an MMU mapping (including allocated space and what it is used for), and any other (server-specific) resources that may need to be freed if the process exits without doing so itself. Generally all non-kernel resources are tracked via message channels (see below). A thread consists of just a processor-specific context and stack space.

Stack space in Unix is usually handled by placing the stack at a high address and letting the process overrun it. The kernel ``knows'' that this is stack space, and therefore allocates the process a new stack page, automatically growing it. In this scenario, there is only room for one stack, really, because adding more stacks would clutter up the address space and prevent the almost-infinite stack growth encouraged in Unix. In other words, it's based heavily around the idea of having a single threaded process and added sub-processes. In KOS, a thread will be given a very small default stack upon which to bootstrap itself in libc. Once this is done, a stack should be allocated and the stack registers changed to point there. Any new threads will be created by actually allocating a stack space within the process' memory space and passing the SP to the kernel when making the syscall. Note that a different approach could be used in libc to emulate the Unix style stack by allocating a single page of memory and mmap()'ing it at the top of the user address space, and then hooking the page fault signal, but the initial implementation won't do this.

3.3 Interrupt routing

The interrupt router will perform basically the same function as the handle_exception function irq.c in KOS 1.0. All hardware specific portions of that functionality will be moved into the platform-specific part of the kernel.

Kernel functions may hook the various interrupts (generally this will be done in the platform-specific section, since the portable part knows nothing about interrupt numbers) and user processes with the proper priveleges may also hook interrupts. In the former case, the interrupt handler will be run in the context of the kernel, in supervisor mode. In the latter case, the interrupt handler will run in the context of the user program, much like a signal handler. In both cases, interrupts will still be disabled when the handler is called and can only be re-enabled by the kernel upon return. Because of this, interrupt handlers must execute as quickly as possible and then return. If a large amount of work must be performed, it is best to change a signal variable so that the user program will handle the request upon return. The basic rule here is to do as little as possible in the interrupt handler itself. The other reason is that no kernel calls will be available to you while running in an interrupt handler. Attempting to use them could terminate the program or reboot the machine, depending on the implementation of the platform-specific part.

All interrupt handlers and the data they access should be marked as unswappable (see below), unless swapping is not possible on the architecture or platform.

Needless to say, interrupt hooking is a privelged function that most user programs could not employ.

3.4 Page table management and memory allocation

Only the most basic page table management will happen in the kernel itself.

First, the kernel must set up the initial page table mappings for a process when it is first created. This will generally include allocating enough pages for the process image and the tiny initial stack. When the process' first thread begins, its libc will issue an sbrk() syscall to obtain more pages to set up intra-process memory management (for use with malloc()). Subsequent requests for more memory will be serviced using sbrk() as well, adding linearly to the process' address space.

After this point, all sbrk() requests will add a page at a time linearly to the process' address space. The other way to map memory with the process is to use mmap(). This method can be used to remap parts of the user space that are already mapped, but make them show up in a different place. It can also be used to map physical memory if the process has the required priveleges.

Any attempt to access a page with the wrong permissions (writing with read-only pages, read or write with non-existant ones) will result in a page fault signal. This is equivalent in Unix to a segmentation fault. The user program may actually hook this signal and try to work around it or at least dump some debugging information like a core file, or it could let itself be terminated.

The kernel itself doesn't actually handle things like virtual memory via swapping. That functionality is left to a user-mode kernel server. There are two hooks provided to facilitate this process. The first is when the kernel is about to run out of memory allocating new pages for programs. If this happens and no swapper process is present, the kernel simply denies the request. This may result in a program terminating with a page fault. If a swapper process is present, then a signal is sent to it requesting it to obtain temporary storage for a page (of the kernel's choice), which is then saved using the swapper. Note that during this process, the swapper itself may become rescheduled, thus it is asynchronous and only blocks the process requesting more memory (as opposed to most legacy systems that freeze entirely while swapping). The page is then marked as ``swapped''. If the page is later accessed by a user program, the swapper will once again be contacted, potentially once to save another unrecently used page, and then again to load in the old page before the program can continue.

Any page in the system can be marked as unswappable, and the attribute may be applied to an entire process (which in turn just applies it to each of its pages, including any new pages it allocates). Any page or process can later be marked as swappable once again, and marking a process as swappable marks the entire process.

The kernel itself can never be swapped out, and the swapper program should likewise be marked as unswappable. All other pages can be swapped freely (including kernel servers).

3.5 Kernel level synchronization

These can be filched almost directly from KOS 1.0. Basically each semaphore contains a queue of waiting threads and a count. When a thread does a wait(), it is placed on the semaphore queue and removed from the main run queue. Every signal() causes one waiting thread to be placed back into the main run queue. Events work similarly, but all waiting threads are released simultaneously.

3.6 Device tree

Every microkernel server that wishes to ``publish'' itself for the rest of the world will add an entry to the device tree. The device tree is a heirarchical listing of message channel handles, by a symbolic name. More about this and refining it later (especially, permissions for sections on the tree) but for now it will basically be svcmpx from KOS 1.0, with a convention to use path names instead of just service names. Exactly one device node is connected to exactly one listening message channel. See below for more about message passing.

3.7 Message passing system

The message passing system will be one of the most heavily trafficked and most important pieces of the kernel, and thus must be given the greatest scrutiny. Although POSIX message queues will be available through an external server (basically the mailbox system from KOS 1.0), the main message passing system will be a synchronous system, not an asynchronous one. Each application wishing to be able to receive requests will create a message channel. Message channels are created by the process that owns them, and it is expected that only that process will ever do a recv() on that message channel. However, any process may open the message channel and send() messages to it. The format of the messages is entirely up to the processes communicating, and will not be changed or inspected by the kernel. Messages may be passed using an IOVEC style system, which is to provide a list of source addresses. Instead of making a temporary buffer and copying the data more than once, this allows the kernel to gather it directly from one address space and deposit it into another. Message channels are opened either by handle ID (an integer value) or by symbolic name (device tree) and should be closed after finishing with them to allow an accurate assessment of how many clients a given server has.

As mentioned a moment ago, message passing will be synchronous. This allows for guaranteed immidiate delivery, which is neccessary to avoid bloating the kernel and slowing everything down. The message passing process looks something like this:

* Server process creates a channel using msg_channel_create()

* Server process executes a msg_recv() or a msg_recv_v() to wait for an incoming message, and goes into the RECV_WAIT state

* Client process looks up the channel using msg_channel_by_name() to get the message channel ID

* Client opens the channel using msg_channel_open()

* Client sends a message using msg_send() or msg_send_v(), and goes into SEND_WAIT state

* The kernel matches up the send and recv and transfers the message data from the client process context to the server process context

* The server process is unblocked and the client process is placed into the REPLY_WAIT state

* The server does some processing and eventually calls msg_reply() or msg_reply_v() to send a response back to the client

* The kernel again matches up the requests and transfers the message from the server process to the client process

* The kernel also sets the integer return value for the message pass, for simple operations like read() and write()

* Client process receives return value, potentially with its data buffers filled

Multiple threads can execute a recv() on a channel, and they will be queued up to receive a message in LIFO order. This allows, e.g., starting a file system server with a thread for each SMP processor; or alternatively, a cheap way to queue requests internal to the server.

Channels are automatically closed when a process exits, and any attached processes will get an error message if they try to access the channel again. Conversely, if a process has connected to a channel and then exits, the channel will receive an indication of the lost client so that it can clean up any resources.

It should be possible for a server process to specify a thread pool system without having to manage it manually (i.e., a new thread is automatically created to handle an incoming message, up to the maximum number). More likely than not, this will be handled in libc somewhere.

See the section below about message passing (microkernel) versus syscalls (monolithic) for an explanation of why this form of message passing is not an order of magnitude slower (like Mach in many cases) and besides the obvious advantages, it may actually save memory usage.

3.8 Asynchronous signal system

This is very analogous to the Unix signal system. Basically when some event happens that a process was not specifically expecting to happen, there must be some asynchronous way of notifying the process. This is done by hooking the signal to a signal handler function.

Signals in KOS are, like Unix, indexed by a number. The first N signal numbers are reserved for system events, and after that they may be used by cooperating user programs as neccessary. A program should never blithely assume that another program understands its custom signal or even interprets it in the same way unless a protocol is well defined. In many such cases, a system number should be reserved for the protocol.

Each process can designate a signal thread, or the oldest thread will be used automatically. When a signal happens, the program's context is temporarily switched to the signal handler in a setjmp() manner. When the signal handling is complete, the process will be reset to its original state. Signals can only happen one at a time - if a signal happens during the processing of another signal, it will be queued up for later processing. If some critical signal like a page fault happens during the execution of a signal handler, the program will be terminated. This behavior may be changed later to allow nested signals.

3.9 Built-in file system

This is basically a highly simplified version of the romdisk file system in KOS 1.0. Its main purpose is to contain an init program and any hardware-specific servers required to bootstrap the normal file system. Once the normal VFS is loaded (if it is loaded), all pages belonging to the built-in file system are relegated to it to be done with as it sees fit (freeing, mounting as /boot, etc). This file system does not contain any real VFS abilities, or even directories. It may be used by ``opening'' a file from its root, which returns a pointer to the file's data, and that's about it. This is to absolutely minimize duplication of VFS code.

4 Platform-specific parts of the kernel

4.1 Initialization

The kernel should be loaded and initialized into a predictable and standard state before the portable section's main() is called. The portable main() will not take a normal argc and argv parameter set, but will take the actual command line string, if any. The command line string may be embedded in the kernel image itself or passed from a boot loader of some kind. All platform-specific initialization should occur here, the exception being any hardware initialized by a kernel server later.

4.2 MMU manipulation

A portable view of the MMU tables will be created for the portable part of the kernel to manipulate, and it is the job of this bit of code to ensure that the portable kernel's view of the MMU is consistent with the platform's view of the MMU tables and so forth. This includes setting up and tearing down mappings, mapping MMU entries to processes, and switching out MMU contexts when neccessary. <Study BSD and Linux and fill this in>

4.3 Context Switching

Generally this part will handle swapping out contexts. There is a single path into and out of the kernel again; within a single processor, the kernel is completely non-reentrant. When a kernel call is executed or an interrupt handler is invoked, the process context is saved. There needs to be some portable way to select another context before the context is switched back in.

4.4 Syscalls

Syscalls are defined as a transition point from user space to kernel space, and thus on most platforms, user/non-priveleged mode to supervisor/priveleged mode. On most systems these are best or only accomplished through a software interrupt or trap request. On some systems (e.g., ia32) it is possible to use a call gate to do this, but the utility of that method over an interrupt call is questionable since they are basically equivalent and ensure atomic locking of the kernel. Syscalls are only ever used when calling the kernel, not when ``calling'' another user process. Message passing is used exclusively for that.

4.5 Interrupt mask handling

Interrupts from the low-level system should generally appear homogeneously to the portable part of the kernel. If you have two interrupt controllers that are daisy chained, for example on a PC, these should show up as interrupts 0-15 instead of 0-7 banks 1 and 2. The low-level handling should understand these mappings and be prepared to mask and unmask any of the interupts as if they are in a flat space. The first N interrupts are reserved for system usage, and all interrupts after that are platform specific. If complex hardware/bridge functionality is required for device driver support, it should be moved into a kernel server instead of the kernel itself.

4.6 SMP operation

This section is almost certainly naive.

SMP operation of KOS will involve booting secondary, tertiary, quartenary, etc, CPUs and starting the kernel on them at a predefined point. This will work almost like a pthread_create() in a legacy OS, but of course the processes will truly be running simultaneously. Rather than a ``big lock'' like many legacy systems, KOS should allow as many CPUs as neccessary to operate simultaneously, but it should also use spin locks to protected portions of the code that may conflict (accessing process lists, etc). Each running version of the KOS kernel will have structures allocated for per-CPU operation (like ``currently executing thread'', etc). The scheduler runs independantly on each processor so that it effectively distributes the load to all of the processors automagically.

4.7 Cache coherency

Basically any time an MMU operation would invalidate a TLB, and any time a code-write operation would invalidate an i-cache, the cache coherency routines will come into play. These just ensure that if the kernel does something ``nasty'', the caches will be invalidated or updated to reflect that.

4.8 Debug I/O

The platform-specific section should include a basic character-based I/O system which may be redirected later or entirely cut off. This will allow for debug output to reach the programmer before the kernel has reached a normal boot point and can load its own servers for things like normal console or serial I/O.

5 The Great Debate, or microkernel versus monolithic

A lot of hoopla has been generated around whether a monolithic kernel or a microkernel is the best approach to system design. Most of the hoopla has been generated by people who didn't know what they were talking about, or didn't know what they were talking about when they made their comments and now they're stuck with them for pride's sake (Linus Torvalds). So the first part of this must be an explanation of exactly what constitutes a monolithic kernel and a microkernel.

In a monolithic kernel (referred to here out as a monokernel), all ``system'' functionality is grouped into one very large library that is system-wide and can be called by any running task to do something for it that it is not priveleged enough to do by itself. Generally the library is loaded when the system is first started and is given god status: it has absolute access to all hardware, all processes, can not be swapped out, and can not be pre-empted. Some modern monokernels get around this by re-enabling interrupts during the execution of long syscalls, but this is basically a hack. The thread of execution follows the syscall into the monokernel, and it may be blocked there by placing the process on a wait queue or changing its status until some hardware event or other system event causes it to unblock. In this system, there is no really well defined way to determine what goes into the kernel and what goes into user space, and in many cases (e.g., DRI on Linux) the distinction is so confused that to obtain microkernel-style functionality the kernel actually makes callbacks into a user-mode process at times. In other cases, a user-mode process acts with kernel-like powers (e.g., XFree86), much like a microkernel server would have done. This is generally a debugging nightmare because you don't have any real memory protection for most of the kernel, and in many cases can be a security concern. It also limits what individual users can do with their own account on such a system because they can't change the kernel to do basic things like extend the file system.

In a microkernel, the kernel itself (with similar restrictions as the monokernel) contains the absolute minimum neccessary to allow processes to be created, scheduled, and communicate with eachother. Generally this involves basic MMU functionality, a scheduler, and some sort of IPC (usually message passing). All other functionality is moved outside the kernel into user processes. As before, all user processes use a syscall mechanism to access the kernel functionality, but the kernel functionality is limited to those basic functions, so it can be very well checked, or in many cases, mathematically proven correct. All other functionality previous regarded as kernel functionality can then be moved into cooperating user-mode processes. For standard system services such as VFS, these servers provide a well-defined interface so that calling them for functionality through message passing is basically the same as using the syscall interface on the monokernels (see below for the meat of this). Various servers are granted priveleges as neccessary to perform their tasks, but only their tasks. In a Unix-style permission system, the 'root' user would have the exclusive right to do things like hook interrupts and mmap() random physical memory locations. Conceivably, a capability security system could be introduced by simply allowing or disallowing such priveleges on a process-by-process or user-by-user basis.

Given that, what do you get out of a microkernel over a monokernel? Memory protection for kernel components against eachother, the ability for users to run and test and debug their kernel components in the same manner as a normal user program, almost free network transparency (a msg_send is very easy to encapsulate and throw over the network), a much cleaner distinction among the components running the basics of the system, a very clean and easy to prove core to the system, and (perhaps as important as anything) a general demystification of the nature of an operating system. Ever take a novice and put them in front of the Linux or BSD kernel sources? Heh, yeah. What do you get as an advantage to running a monokernel? Well, the tiniest bit more speed for I/O intensive applications and perhaps a bit of saved memory; to be fair, you have to throw in there also that there are lots of free monokernels already so there's more legacy support if you need that. To figure out what is best for a given situation, we have to compare the two methodologies.

The first and most important part of the debate is whether the message passing overhead is worth the added features of a microkernel. To see that this is basically a can't-lose situation, consider the following comparison. I am assuming in this argument that the kernel is designed with strictly synchronous, blocking message passing.

In a monokernel, a user process performs a task like writing to a file by pushing the function parameters on the stack (or in the case of Linux, putting them in registers) and then invoking a software interrupt or trap which the kernel has hooked for its syscall interface. The kernel looks up the syscall index and calls the appropriate syscall handler. This handler must then invoke a VFS system to further route the request, where it eventually ends up inside a handler for the particular relevant file system, along with some file context information. Let's say that the system uses 4K pages, and the user wishes to write a 16K block of data. Not only is the address the user passed not valid in kernel space, but the pages may be in five totally unrelated places in the system! So to work around this issue, one of two things must happen. One, the kernel function must manually seek out the MMU mappings and gather the pieces as it writes them. This massively breaks an important abstraction, forcing individual file system handlers(!) to understand the virtual memory system. Needless to say, this is not desirable. The other possibility is for the kernel to provide a function such as copyin()/copyout() (in BSD) or copy_from_user()/copy_to_user() (in Linux). These functions perform the nasty task of putting together the pages from user space into a kernel space buffer. Assuming an intelligent design, they will allow the kernel function to copy or read only the portion it wants at the moment, alleviating the need to allocate an extra buffer to hold it all. Generally at this point, a physical disk driver is invoked, and the request queued with the hardware. The user process is then placed on a ``waiting for disk to finish'' type of queue, and other tasks are returned to instead of the one that invoked the syscall. When the disk finishes writing, the original task will be queued again, and eventually it will return with a success value in the return register (e.g., eax or r0).

In a microkernel, the user program would call its write() function in its libc. This all looks exactly the same as the monokernel at this point, but the libc is where the functionality changes. Instead of making a kernel syscall to do the write itself, the libc passes a WRITE message to the VFS server. The message is constructed using IOVEC (see writev on Unix), so that a header constructed on the fly from the user stack and the buffer are passed as a message without any copying on the part of the user process. To pass the message, a kernel syscall is invoked with a message channel handle and the IOVEC structure. (As an aside, the libc has already opened this channel because it had to do so when the user first used any VFS functionality such as open(). This is neccessary overhead but is so miniscule in the grand scheme that it's not worth considering.) The kernel's syscall handler looks at the syscall type (msg_send) and passes it along to the message passing system. The message passing system will look for a matching process that is already blocked waiting for a message. If this is not the case, the caller is placed on a queue and another task is returned to (as above with the monokernel). If there is a matching process, then the process mentioned above happens: the kernel's equivalent of copyin()/copyout() is invoked to copy directly from the pieces in the calling process to the buffer pieces in the server process, and the server process is queued. When a msg_reply() is invoked, any data will be copied back into the original caller's space, and a return code will be returned.

Note that for simple calls, then, the overhead is almost identical to using a monokernel syscall, except that the invoked procedure has its own MMU context, and thus can't accidentally destroy something else. When passing data around, the data is actually copied from one process to another, which is the only real added overhead: a smart monokernel could read the data in-place using the method described above because all parts of a monokernel have access to the address space of all processes. This shows that with a normal monokernel design it's actually not possible to memory protect the various components from eachother without gaining the overhead of a microkernel. The only way around this is to use mmap() style functionality to directly map pages from one process to another, but this incurs two issues: one, the advantages of memory protection are basically wasted without being able to page-align buffers of data sent to syscalls and ensure that only that data is in that page (wasting massive amounts of memory); and two, if such a system was used, it would be equally applicable to a microkernel, so there is no advantage to using a monokernel system (unless you don't like the other consequence of a microkernel, see below). One way to do this that would solve the issue is to use variable length segmentation on processors like ia32, but that is not a portable way of doing it and incurs the extra overhead of a TLB-miss-style cache miss anyway.

The other overhead incured by using a microkernel is a resource rather than speed one: as can be implied from the above strategy, a thread must be blocked in the server process waiting for a request to come in. This means that if you want to have 30 processes accessing the VFS system at once, you'd need 30 blocked threads waiting for the requests. This is a naive approach to the problem. There are two solutions to this issue. The first one is that modern microkernels generally provide a way to spawn threads on the fly and pool them for later usage as needed. This is sort of like Apache server processes on a normal Unix system: a pool of them is started and incoming requests are handed off to an idle server, or a new one is created if the pool's maximum hasn't been exceeded. Since this could basically be handled by the libc during the msg_recv(), there is little overhead incurred here except the starting of a new thread when there is a high load. The other solution is more germane to this comparison and is basically analogous to the monokernel way of doing it. Simply start one thread and block incoming messages with a setup similar to the standard single-threaded TCP/IP accept(). This sounds a bit non-obvious until you've done some microkernel programming, but incoming messages are only blocked until a recv() is executed again. So in the VFS server situation, the server could block with a recv(), waiting for new messages. It gets an open request and processes it, returning an ID. Note that in a lot of modern VFS implementations, a lot of this would all encapsulated in a mutex anyway, so the effect is the same. Now let's say the user process wants to read(), but there is no data available from the real FS handler. So what the VFS will do is block the caller by not calling reply() immidiately, and add them to a queue to be replied to later. Now the VFS server is ready to receive more messages. The effect is almost identical to the monokernel way except that in this case mutual exclusion during critical setup sections is automatic. If you want to do this on an SMP system, no problem: spawn a thread for each processor and put in your own mutual exclusion.

After looking at all that, you have to ask yourself: what do you really want to spend your time doing? Do you want to spend time tracking down mysterious pointer bugs causing a kernel oops (And bringing down your whole system) while doing something as simple as writing an FTP file system (or as complex as writing a device driver)? Generally, no. Is that small overhead worth the advantages? We think so.

About this document ...

KallistiOS 1.1.x (devel branch of KOS, to be a complete rewrite)

This document was generated using the LaTeX2HTML translator Version 99.2beta8 (1.46)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /home/bard/prj/kos/doc/design.tex

The translation was initiated by Dan Potter on 2001-04-28


next_inactive up previous
Dan Potter 2001-04-28