Skip to main content

I/O System: From Device Drivers to Asynchronous Streams

The I/O System is the bridge between the high-speed, structured world of the CPU/RAM and the chaotic, asynchronous world of external hardware. Managing devices ranging from ultra-fast NVMe drives to slow human-interface devices requires a flexible and robust kernel architecture.

This chapter explores how the kernel manages device communication, optimizes data throughput, and provides a unified interface for diverse hardware.


1. Hardware Communication Primitives

The CPU uses several low-level mechanisms to talk to hardware controllers.

1.1 Memory-Mapped I/O (MMIO)

Modern CPUs reserve a portion of the physical address space for hardware devices.

  • The Concept: Writing to address 0xFFFF0000 might not write to RAM, but instead sends a command to a network card.
  • Benefit: The CPU can use standard load/store instructions to interact with hardware.

1.2 Interrupts: The Asynchronous Signal

Instead of the CPU constantly checking (polling) if a device is ready, the device signals the CPU.

  • Interrupt Vector: An index into the Interrupt Descriptor Table (IDT).
  • Masking: The CPU can temporarily disable interrupts to protect critical kernel code.

1.3 Direct Memory Access (DMA) and IOMMU

Transferring megabytes of data through the CPU is a waste of cycles.

  • DMA Controller: A specialized hardware unit that can move data directly from a device (e.g., Disk) to RAM without CPU intervention.
  • IOMMU: Similar to an MMU but for I/O devices. It ensures that a device can only access the memory regions explicitly assigned to it by the kernel, providing hardware-level protection.

2. Kernel I/O Architecture

2.1 The Device Driver Model

A Device Driver is the software component that bridges the gap between the kernel's generic I/O calls and the hardware's specific registers.

  • Character Devices: Handle streams of bytes (Keyboards, Mice, Serial Ports).
  • Block Devices: Handle data in addressable blocks (HDDs, SSDs, Flash).
  • Network Devices: Handle discrete packets.

2.2 Top Halves and Bottom Halves (Linux)

Handling an interrupt must be fast because other interrupts are disabled during the process.

  • Top Half (ISR): The immediate response. It acknowledges the interrupt, saves critical data, and schedules the "Bottom Half."
  • Bottom Half (Tasklets/Workqueues): Performs the heavy lifting (e.g., parsing a network packet or copying data) asynchronously.

3. The I/O Stack and Scheduling

3.1 Block I/O Layer

When a file system requests a block, it enters the Block Layer.

  • Merging: If requests for blocks 10-15 and 16-20 arrive, the kernel merges them into a single 10-block request.
  • Sorting: The kernel sorts requests to minimize disk head seek time.

3.2 I/O Schedulers

  1. Deadline: Prioritizes avoiding starvation by setting a strict expiration time for each request.
  2. BFQ (Budget Fair Queuing): A complex scheduler that tries to give each process a fair share of disk bandwidth. Excellent for desktop responsiveness.
  3. Kyber: A modern, lightweight scheduler designed for ultra-fast NVMe devices.

4. Performance: Buffering and Caching

4.1 Double Buffering

While the device is filling one buffer, the application is reading from the second. This hides the latency of the device.

4.2 Zero-Copy (Performance Optimization)

Traditional read() followed by write() to a socket involves 4 context switches and 4 data copies.

  • sendfile() / splice(): These system calls allow the kernel to transfer data directly from the Page Cache to the Socket Buffer, avoiding any copying into user-space RAM.

5. Modern High-Performance I/O

5.1 I/O Multiplexing: epoll

How does a web server handle 100,000 connections?

  • select/poll: The server asks the kernel "Which of these 1,000 sockets are ready?" The kernel re-scans all 1,000 every time (O(N)O(N)).
  • epoll (Linux): The kernel maintains a list of ready sockets. When a socket becomes ready, it is added to a "ready list." The server only processes the ready ones (O(1)O(1)).

5.2 io_uring: The New Frontier

io_uring is a Linux interface that uses a shared memory ring buffer between user-space and the kernel.

  • Submission Queue (SQ): User-space adds I/O requests.
  • Completion Queue (CQ): Kernel adds results.
  • Result: Zero system calls are required to submit I/O once the rings are set up, leading to massive performance gains for databases and network proxies.

6. Device Abstraction: ioctl and mmap

6.1 ioctl (Input/Output Control)

A "catch-all" system call used for device-specific operations that don't fit into read/write (e.g., ejecting a CD-ROM or setting the speed of a serial port).

6.2 mmap for Devices

High-speed devices (like Video Cards) can map their internal memory directly into the application's virtual address space. Writing to a specific memory address then updates the screen pixels directly.


7. Troubleshooting and Tools

ToolFocusKey Insight
iostatSystem-levelDisk throughput and latency
iotopProcess-levelWhich app is "hogging" the disk?
lsblkStructureLists all block devices and partitions
dmesgKernel-levelDevice driver errors and hardware logs
tcpdumpNetwork I/OCaptures raw packets at the kernel level

8. Summary Checklist

  • Explain the role of the IOMMU in modern systems.
  • What is the difference between a character device and a block device?
  • Why do modern kernels split interrupt handling into Top and Bottom halves?
  • How does io_uring reduce the overhead of high-frequency I/O?
  • Trace a data packet from a network wire to a user-space application.

End of Chapter 06. Continue to Chapter 07: Storage System.