Writing a device driver for Unix V6

In this post we will learn the useful skill of writing a device driver for Unix V6 (released in 1975) and run it on an emulator. The implemented device is fairly trivial: it will open a message box on the host OS. The goal is that we can execute the following command:
echo "Hello world" > /dev/mb
And get this:
Result

The message box is opened in the host OS, so effectively we create a special file (/dev/mb) that accepts text input and opens a message box.

Since Unix V6 runs on PDP-11 and I do not have access to one, everything is run via SimH. SimH is a family of old computer emulators, ranging from IBM 1401 to Altair 8800. We only need the PDP-11 one though.

Background

Unix V6 was developed in 1975 on a PDP-11. By this version, most of the code was written in C and only a couple of hundreds assembly code remained. Already, it had many of the utilities that are common on a today's Linux system: cp, rm, bc/dc, cron, dd, chmod, ls. The OS was very powerful: had memory management, multiple users and pre-emptive multitasking. It is funny to see that many of these features arrived only in the 90s for personal comptuters, while bigger minicomputers had them for decades.

The choice of Unix V6 is motivated by two facts: first, it is written mostly in C, thus easy to understand. Second, there exists a full documentation of the kernel code. John Lion's Commentary on UNIX 6th Edition is a thorough description of the inner workings of Unix and the underlying algorithms. It is begginer friendly, guiding you through the kernel step-by-step.

PDP-11 was a family of computers produced between 1970 and 1990. The Bell Research Lab used these devices so both the Unix OS and the C language was developed primarily on them. PDP-11 minicomputers introduced inportant innovations, that let Thompson and Ritchie implement several advanced features for Unix: hardware support for memory paging, user and kernel mode, different priority levels for interrupts. Again, CPUs of personal computers only had these in the late 80s – early 90s.

Implementing a new device

To add the new device to Unix, first we have to create a device in the emulator. On the PDP-11 it was common to map certain regions of the memory to a device and communicate with the device via this memory region. When a value was written to a particular address, the values were forwarded to the device. Conversely, the status of the device could be read by reading the right locations.

We will follow the same strategy. The device, called mb, is mapped to the 17777340-17777342 addresses (addresses are in octal). The characters of the message are written one-by-one at 17777340 which our virtual device reads and stores the letters in a buffer. Upon receiving a newline or zero (\0) character the buffer is converted to a string and displayed in a message box. The device handles ASCII character only, however PDP-11 has a 16 bit CPU, so 2 bytes are written at a time. This is the reason why we have mapped a 2 bytes long region instead of a single byte.

Our choice of emulator is SimH. SimH is not actually an emulator but a framework for writing emulators. Roughly speaking, it lets you define devices that can be attached to an emulated machine. These devices include peripherals such as magnetic tape reader, disk drive or teletypes. When attached, a device provides additional registers and can be memory-mapped to a hardcoded region. The implementation interface is quite simple: a pair of read/write functions must be defined for the registers and the memory-mapping. Since want to use only the latter, and only for output, it is enough to implement the memory write function:

#define BUF_LEN 250

TCHAR buf[BUF_LEN+1];
int cnt;

t_stat mb_wr (int32 data, int32 PA, int32 access)
{
	char c = data & 0xff;
	if (c == '\n' || c == 0 || cnt >= BUF_LEN) {
	    display_message();
	    mb_reset(NULL);
	}
	else {
	    buf[cnt++] = c;
	}

	return SCPE_OK;
}

void display_message()
{
    MessageBox(
        NULL,
        (LPCTSTR)buf,
        (LPCTSTR)L"SimH - MB device",
        0
    );

}

The function takes an address (PA) and a value (data) that will be written to the address. Since our virtual device supports ASCII only, we take the lower byte. If the buffer is full or we have received a newline or \0 character, the message box is shown. Otherwise, the character is appended to the buffer. We ignore the address, as the device is attached to a single word address anyway.

All that's left is to connect the device to a certain memory address and tell SimH the name of the write function.

#define IOBA_MB         017777340           /* MB - message box */
#define IOLN_MB         002

DIB mb_dib = { IOBA_MB, IOLN_MB, NULL, &mb_wr, 0 }

In the mb_dib structure, the first value is the start of the memory segment, followed by the length and the memory read-write functions. The device does not support reading so the read function is NULL. The mb_dib structure is then passed to SimH to make it aware of the existence of mb.

Let's test if the new device works. After firing up SimH:

# Activating and initializing the device
sim> set mb enabled  
sim> reset
# write 'A\0' to the device by poking memory
sim> deposit 17777340 101
sim> deposit 17777340 0

First, the device have to be turned on an initialized with reset. Then, the octal values 101 and 0 are written at the adress 17777340. Upon writing the 0 byte, the message box appears:

Result

Implementing the driver

Let's move on to the implementation of the driver itself. In UNIX V6, there are two types of devices: block and character special devices. The first one uses a buffering mechanism implemented in the kernel and reads/writes blocks of data. Typically, floppy and cartridge drives use this method. Character devices can do whatever they want, though most of the time they will print/read one character at a time. Examples are terminals and paper tape punchers.

Let's start with the driver code:

#define	MBADDR	0177340

struct {
	int mbchr;
};

mbwrite()
{
	register int c;

	while ((c=cpass())>=0) {
		MBADDR->mbchr = c;
	}
}

UNIX will call the mbwrite function when something is written to the device. mbwrite calls the cpass function in a loop, cpass returns the next character from the argument of the write system call. There is one strangeness in this code: the anonymous struct and using the MBADDR address as a struct. This is an ancient behaviour of the C language from the early 70s. Back then, struct field names were kind of global, meaning when seeing an expression var->structfield, the compiler ignored the type of var and calculated the offset based on structfield alone. This also meant that field names in structs had to be unique. In this code MBADDR->mbchr is equivalent to *(MBADDR+0). This saves putting MBADDR into a temporary pointer and dereferencing the pointer.

The second quirk astute readers might have noticed is that we are writing to the address 177340 while the device was mapped 17777340. This is because UNIX is using virtual memory to have access to more than the 64kb RAM that 16 bit would allow to address. In kernel mode, the last 8kb of virtual memory (160000 to 177777) is always mapped to the last 8kb of the physical memory (17760000 to 17777777 in the emulator configuration).

Having the device driver, we just have to tell the kernel about it by adding an entry to the device configuration table:

int     (*cdevsw[])()
{
        &klopen,   &klclose,  &klread,   &klwrite,  &klsgtty,   /* console */
		...
        &nulldev,   &nulldev,  &nodev,    &mbwrite,  &nodev,    /* mb */
        0
};

The table contains the implementations of the open, close, read and write system calls for every device. By the way, this table can be generated automatically by the mkconf utility.

After rebuilding the kernel, we can mount the device like this:

/etc/mknod /dev/mb c 16 0

The name of the mountpoint is /dev/mb, c means it is a character special device. 16 is the major device number, it is the row index of the driver in the cdevsw table. 0 is the minor device number: when multiple devices of the same type are attached, this signifies the id of the specific device.

Now let's see what happens, when someone enters the following code to the shell:

echo Hello world! > /dev/mb

The shell, after parsing the command, will call the write system call, which just calls rdwr. rdwr is the common implementation of the read and write system calls. After checking permissions, it decides whether the file is a pipe or inode, and eventually calls the writei function (link). writei means "write inode", that is it will write to that node in the file system. Since /dev/mb is a character special file, writei quickly dispatches the code to the driver:

if((ip->i_mode&IFMT) == IFCHR) {
	(*cdevsw[ip->i_addr[0].d_major].d_write)(ip->i_addr[0]);
	return;
}

The if clause checks if the inode to write to is a character device. If yes, it will look up the write function of the device in the cdevsw table (using the major device number as index) and call it with the minor device number (stored in ip->i_addr[0]). The d_write function is mapped to mbwrite and will write the string one character at a time to the address 0177340. This triggers our device's code in the emulator and results in showing the message box.