My-Tiny.Net :: Networking with Virtual Machines



Device Drivers, Kernel Modules, and udev



This note explains device driver handling in Linux, and the associated filesystems.

Key Concepts

You can think about an operating system as having two levels. At the top is the "user space" where applications are executed, and below the user space is "kernel space" ("space" in this sense actually refers to protected memory addresses). In Linux, the GNU C Library (glibc) provides a system call interface that connects user-space applications and the kernel.

Abstract handling of devices is one of the basic features of Unix and Unix-like systems. From the user-space perspective, all hardware devices look like regular files; they can be opened, closed, read and written using the same, standard, system calls that are used to manipulate files. Every device in the system is represented by a file - even to read from memory is to read from a file.

Consider the output of ls -l. The first character in each and every line indicates one of the following file types.
-regular file
lfile link
ddirectory
pnamed pipe
ccharacter special device
bblock special device
Obviously, different devices behave and react differently: there are no keys on a hard disk and no sectors on a keyboard, although you can read from both. All of the functions used to access a specific device are jointly referred to as the device driver. Each device on the system has its own device driver, with the routines needed to open, write to, read from, and close (among other things) that particular device.

The difference lies in kernel space, where the virtual file system (VFS) decodes the file type and transfers the file operations to the appropriate channel, like a filesystem module in case of a regular file or directory, or the corresponding device driver in the case of a device file.

    
Source: M. Tim Jones, Anatomy of the Linux kernel, 06 June 2007
http://www.ibm.com/developerworks/linux/library/l-linux-kernel/

A device node is a file with type c (for "character" devices, devices that bypass the kernel buffer cache) or b (for "block" devices, which go through the buffer cache). Each device file is assigned a "major number" and a "minor number". Traditionally, each device driver has a major number, and all device files for devices controlled by that driver have the same major number. It is entirely up to the driver how the minor number is interpreted; the driver documentation usually describes how the driver uses minor numbers.

Some devices have a fixed major/minor number pair assigned to them - there is a list at
http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/Documentation/devices.txt
and a more definitive one can be found in Documentation/devices.txt within the kernel source tree. Modern Linux kernels allow multiple drivers to share major numbers, but most devices that you will see are still organized on the one-major-one-driver principle.

Device files (also called device nodes) are located in the /dev/ directory. In early Linux kernel versions, /dev had one static device file for each device that might possibly be connected to the system (and controlled by a device driver). Now, with udev, device nodes are created only for devices which are actually present in the system.

Kernel modules

As noted above, all of the functions used to access a specific device are jointly referred to as the device driver. Each device driver must be statically compiled into the kernel or stored on disk as a dynamically loadable module.

Devices identify themselves by an id, which tells what kind of device it is. Usually this consist of vendor and product id and other subsystem specific values. Every device driver has a list of known id's for devices it can handle. The program depmod reads each module file and creates the file modules.alias in /lib/modules for all of the available dynamically loadable modules, along with a list of dependences by determining what symbols each kernel module exports and what symbols it needs.

To see a list of all available kernel modules, use ls /lib/modules/$(uname -r)

Running modprobe -c and counting the lines will give us the total number available (you will be surprised!)
modprobe -c | wc -l

To show the status of modules currently loaded in the kernel, just use lsmod

Kernel module loading

One of the first actions during bootup is to mount /dev/ with a tmpfs filesystem (tmpfs means wipeout on reboot). All initial and static device nodes from the /lib/udev/devices directory are copied to the empty /dev/ directory. After that, the udevd daemon is started. The udev daemon reads and parses all rules from the /etc/udev/rules.d/*.rules files once at startup and keeps them in memory, and then listens to the netlink socket that the kernel uses for communicating with user space applications.

The kernel bus drivers (usb, ide, network, etc.) probe for devices. For every detected device, the kernel creates an internal device structure and the driver core sends an event to udev. The driver core uevents look like this:

   recv(4, "add@/class/input/input9/mouse2\0
           ACTION=add\0
           DEVPATH=/class/input/input9/mouse2\0
           SUBSYSTEM=input\0
           SEQNUM=1064\0
           PHYSDEVPATH=/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0\0
           PHYSDEVBUS=usb\0
           PHYSDEVDRIVER=usbhid\0
           MAJOR=13\0
           MINOR=34\0", 
        2048, 0) = 221
Note how the kernel will assign a major/minor number pair when it detects a hardware device.

If a module needs to be loaded, the kernel includes a MODALIAS argument in the event. For a USB mouse it looks like this:
     MODALIAS=usb:v046DpC03Ed2000dc00dsc00dp00ic03isc01ip02
udev calls modprobe $MODALIAS, and if modprobe can match the device alias composed by the kernel for the device with an alias provided by the module on the list created by depmod, the module will be loaded.

Kernel modules can also be managed with insmod and rmmod.
The difference between modprobe and insmod is that modprobe needs 
only the module name, but insmod needs the path to the module. 
Also, insmod does not load the module dependencies, but modprobe does.
rmmod is similar - insmod to load, rmmod to unload a module.

modprobe -l is used for displaying all the modules available 
(that can be loaded):

lsmod must be run by root, to list currently loaded modules, and display the name, 
size, use count, and list of referring modules.

If you don't know why a module is needed, use modinfo 
to find information about it

GOOD Links at the bottom too!
http://www.thegeekstuff.com/2010/11/modprobe-command-examples/

/dev/ and /sys/

Once the proper device driver is loaded, udev processes the event (see below) and creates the device node in /dev/. Then, for every device the kernel has detected and initialized, a directory with the device name is created that contains attribute files with device specific properties in /sys/.

Just like /proc/ uses the "proc" virtual filesystem for process information exported from "kernel space" to "user space", the /sys/ directory uses a "sysfs" virtual filesystem. You can see this with the mount command.

In /sys/class/ there is a directory for each different class of device. There are also a large number of symlinks, for easy access to devices without having to know exactly which PCI and USB ports they are connected to. Particularly of note are the files with leafname "dev", which contain the major and minor device numbers of the device and "uevent". If you write the correct string to a uevent file the kernel generates an event as if the device had just been plugged in, so that udev can create devices in a consistent fashion for all devices, both fixed and removable.

udev Event Processing

Every event is matched against the set of rules provided in /etc/udev/rules.d/*.rules. These rules can add or change event environment keys, request a specific name for the device node to be created, add symlinks pointing to the node or add programs to be run after the device node is created. udev rules can match on any property the kernel exports in /sysfs/ or adds to the event, and the rule may also request additional information from external programs.

The rule syntax and keys to match or import data are described in the udev man pages. An advanced example is persistent device naming, to provide stable names for all devices regardless of their order of recognition or the connection used to plug the device.

If udev rule files are changed, added or removed, the daemon receives an inotify event and updates the in-memory representation of the rules.

From the user space perspective, there is no difference between a device coldplug sequence and device discovery during runtime (hotplug). All devices that are plugged in or removed will cause an uevent to be sent to the udev daemon, which runs an event process to match against udev rules, create/remove the device node and symlinks as required, and possibly run specified programs to set up/clean up after the device.

A couple more applications that build on this system to provide services that allow user space applications to accesses and manage hardware and hardware related events are worth mentioning.

dbus - Desktop Bus - is like a system bus which is used for inter-process communication (IPC). Applications register with the D-Bus daemon to receive notifications of events and also post event notifications that other applications may be interested in, such as a digital camera being plugged in or a laptop computer closing its lid. In this way, desktops such as GNOME and KDE could, for example, start the file browser for a newly attached USB flash drive.

HAL (Hardware Abstraction Layer or Hardware Annotation Library) used to be responsible for mediating access to hardware by desktop applications. HAL would either scan /sys/ or get notification of changes via a udev rule, and issue a broadcast message on the D-Bus IPC system to all interested processes. HAL basically maintained its own database of devices, and got into the business of guessing what devices were likely to be cameras, iPods, etc.. By mid-2011 the functionality of HAL had been integrated into udev or moved to other daemons such as udisks and upower, and HAL was deprecated by most Linux distributions and desktop environments.

Event queue management

The udev daemon takes care of the right order of event execution and serializes events for devices which depend on other events. Events for child devices will be delayed until the event for the parent device has returned. That way, for example, partition events will wait for the main block device event to finish.

The current state of the event queue is visible in /etc/.udev/queue. If that directory exists, then events are currently queued or already running. Every event in this directory is represented as a symlink to the corresponding sysfs device. With the removal of the last event, the whole directory goes away.

All event processes which have failed because of an error or executed program returning a failure will be represented in /etc/.udev/failed. A later successful event for the same device will remove the symlink from that directory.

Certain steps during bootup need to synchronize with the kernel event handling. This can be accomplished by watching the event queue directory. If events fail at this stage, they can be retried at a later stage by picking up the symlinks from the "failed" directory.

Debugging events

udevmonitor visualizes the driver core events and the udev event processes. This is a sequence of events while connecting an USB mouse:
  UEVENT[1132632714.285362] add@/devices/pci0000:00/0000:00:1d.1/usb2/2-2
  UEVENT[1132632714.288166] add@/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0
  UEVENT[1132632714.309485] add@/class/input/input6
  UEVENT[1132632714.309511] add@/class/input/input6/mouse2
  UEVENT[1132632714.309524] add@/class/usb_device/usbdev2.12
  UDEV  [1132632714.348966] add@/devices/pci0000:00/0000:00:1d.1/usb2/2-2
  UDEV  [1132632714.420947] add@/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0
  UDEV  [1132632714.427298] add@/class/input/input6
  UDEV  [1132632714.434223] add@/class/usb_device/usbdev2.12
  UDEV  [1132632714.439934] add@/class/input/input6/mouse2 
The UEVENT lines show the events the kernel sends over netlink, the UDEV lines show the finished udev event handlers. The timing is printed in microseconds. The time between UEVENT and UDEV is the time udev took to process this event or was queued to synchronize it with other events.

udevmonitor --env shows the complete event environment:
  UDEV  [1132633002.937243] add@/class/input/input7
  UDEV_LOG=3
  ACTION=add
  DEVPATH=/class/input/input7
  SUBSYSTEM=input
  SEQNUM=1043
  PHYSDEVPATH=/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0
  PHYSDEVBUS=usb
  PHYSDEVDRIVER=usbhid
  PRODUCT=3/46d/c03e/2000
  NAME="Logitech USB-PS/2 Optical Mouse"
  PHYS="usb-0000:00:1d.1-2/input0"
  UNIQ=""
  EV=7
  KEY=70000 0 0 0 0 0 0 0 0
  REL=103 
Udev also sends messages to syslog. The default syslog priority is specified in the udev configuration file /etc/udev/udev.conf. The log priority of the running daemon can be changed with

udevcontrol log_priority=level or number

Key Files

  /etc/udev/udev.conf    -> main udev config file
  /etc/udev/rules.d/*    -> udev event matching rules
  /lib/udev/devices/*    -> static /dev content
  /lib/udev/*            -> helper programs called from udev rules
  /dev/*                 -> dynamic udev content 


Best information and event details was from
Kay Sievers - Dec 2005 - Recent state of udev
http://vrfy.org/log/recent-state-of-udev.html
which seems to be long gone, but it was once available from
http://web.archive.org/web/20090918133056/