Asynchronous I/O Model

The current Linux I/O model is synchronous, meaning that the driver code for the read/write/ioctl/etc. call is executed immediately. This model is simple, but unfortunately does not allow the additions necessary for our protection model. An asynchronous model is one where the driver code is executed in a seperate thread, asynchronously with respect to the program. Our model is a little bit of both actually, since although driver code is executed seperately in a "driver thread", it has to be synchronous with respect to the user program.

How is this done? In the Ring Cycle, when a user program requests the services of a driver, it is put to sleep on a wait queue and sits there until the driver thread finishes. The kernel will grab a free driver thread, modify it's stack so it will execute the desired routine with the desired arguments, and finally mark the thread as runnable. In the task_struct for this driver thread there is a wait queue that the user program sleeps on. When the driver thread executes, it actually executes some small wrapper code that calls the driver specific routine, and then executes a system call which tells the kernel it is done. In this call, the kernel will give the user program the return value from the driver thread, wake up the user thread, and finally return the driver thread to the thread pool.

This presents several complications. First off, many calls involve pointers to user buffers (such as read and write). Since driver threads are now running independently from the user process, how does the model ensure correctness? The answer is that when a driver thread is activated, it's memory map (page tables) are changed to that of the thread requesting its service. This is possible since driver code is mapped into everyone's address space. A side benefit of this is that since the page tables will be the same, no TLB flush is necessary when doing a context switch to the driver thread from the user thread. This greatly enhances performance.

A few other issues present complications:

kernel data structures passed as pointers (struct file*, struct inode*, etc.)

The driver should be able to access these, so the kernel will allocate a virtual page in the appropriate ring and map those addresses to the same page as where the kernel structure lives. This allows the driver to access those data structures directly, but it is restricted to just that page. The mapping is then cleared when the thread exits. One possible security enhancement would be to mark this page as read-only and verify writes in the page fault handler. This would introduce a performance hit, but also make drivers even more isolated.

current

The current macro gives a pointer to the task_struct for the currently running process. Our model does not trust the driver thread with the task_struct for the thread that called it. Instead, we trust the driver thread with it's own task_struct only. This is useful since driver threads can go to sleep on events (for example, to wait for an interrupt) the same way that it would put a user thread to sleep. Also, we simply copy the capabilities from the user thread to the driver thread before activating it, so the driver thread can do appropriate capability checks. Since current is used relatively frequently, and mapping is set up the first time the driver thread references it, and is not destroyed until the thread is destroyed (at module unload).

scheduling

The Linux 2.4 kernel was non-preemptive, and 2.6 can be as well. The Ring Cycle has not been tested on a pre-emptive kernel, but since driver code now runs in threads, it must ensure that these threads cannot be pre-empted to ensure correctness. It does this by not doing a reschedule when returning control to a non-ring 3 thread.