I/O System: From Device Drivers to Asynchronous Streams

The I/O System is the bridge between the high-speed, structured world of the CPU/RAM and the chaotic, asynchronous world of external hardware. Managing devices ranging from ultra-fast NVMe drives to slow human-interface devices requires a flexible and robust kernel architecture.

This chapter explores how the kernel manages device communication, optimizes data throughput, and provides a unified interface for diverse hardware.

1. Hardware Communication Primitives

The CPU uses several low-level mechanisms to talk to hardware controllers.

1.1 Memory-Mapped I/O (MMIO)

Modern CPUs reserve a portion of the physical address space for hardware devices.

The Concept: Writing to address 0xFFFF0000 might not write to RAM, but instead sends a command to a network card.
Benefit: The CPU can use standard load/store instructions to interact with hardware.

1.2 Interrupts: The Asynchronous Signal

Instead of the CPU constantly checking (polling) if a device is ready, the device signals the CPU.

Interrupt Vector: An index into the Interrupt Descriptor Table (IDT).
Masking: The CPU can temporarily disable interrupts to protect critical kernel code.

1.3 Direct Memory Access (DMA) and IOMMU

Transferring megabytes of data through the CPU is a waste of cycles.

DMA Controller: A specialized hardware unit that can move data directly from a device (e.g., Disk) to RAM without CPU intervention.
IOMMU: Similar to an MMU but for I/O devices. It ensures that a device can only access the memory regions explicitly assigned to it by the kernel, providing hardware-level protection.

2. Kernel I/O Architecture

2.1 The Device Driver Model

A Device Driver is the software component that bridges the gap between the kernel's generic I/O calls and the hardware's specific registers.

Character Devices: Handle streams of bytes (Keyboards, Mice, Serial Ports).
Block Devices: Handle data in addressable blocks (HDDs, SSDs, Flash).
Network Devices: Handle discrete packets.

2.2 Top Halves and Bottom Halves (Linux)

Handling an interrupt must be fast because other interrupts are disabled during the process.

Top Half (ISR): The immediate response. It acknowledges the interrupt, saves critical data, and schedules the "Bottom Half."
Bottom Half (Tasklets/Workqueues): Performs the heavy lifting (e.g., parsing a network packet or copying data) asynchronously.

3. The I/O Stack and Scheduling

3.1 Block I/O Layer

When a file system requests a block, it enters the Block Layer.

Merging: If requests for blocks 10-15 and 16-20 arrive, the kernel merges them into a single 10-block request.
Sorting: The kernel sorts requests to minimize disk head seek time.

3.2 I/O Schedulers

Deadline: Prioritizes avoiding starvation by setting a strict expiration time for each request.
BFQ (Budget Fair Queuing): A complex scheduler that tries to give each process a fair share of disk bandwidth. Excellent for desktop responsiveness.
Kyber: A modern, lightweight scheduler designed for ultra-fast NVMe devices.

4. Performance: Buffering and Caching

4.1 Double Buffering

While the device is filling one buffer, the application is reading from the second. This hides the latency of the device.

4.2 Zero-Copy (Performance Optimization)

Traditional read() followed by write() to a socket involves 4 context switches and 4 data copies.

sendfile() / splice(): These system calls allow the kernel to transfer data directly from the Page Cache to the Socket Buffer, avoiding any copying into user-space RAM.

5. Modern High-Performance I/O

5.1 I/O Multiplexing: `epoll`

How does a web server handle 100,000 connections?

select/poll: The server asks the kernel "Which of these 1,000 sockets are ready?" The kernel re-scans all 1,000 every time ( $O(N)$ ).
epoll (Linux): The kernel maintains a list of ready sockets. When a socket becomes ready, it is added to a "ready list." The server only processes the ready ones ( $O(1)$ ).

5.2 `io_uring`: The New Frontier

io_uring is a Linux interface that uses a shared memory ring buffer between user-space and the kernel.

Submission Queue (SQ): User-space adds I/O requests.
Completion Queue (CQ): Kernel adds results.
Result: Zero system calls are required to submit I/O once the rings are set up, leading to massive performance gains for databases and network proxies.

6. Device Abstraction: `ioctl` and `mmap`

6.1 `ioctl` (Input/Output Control)

A "catch-all" system call used for device-specific operations that don't fit into read/write (e.g., ejecting a CD-ROM or setting the speed of a serial port).

6.2 `mmap` for Devices

High-speed devices (like Video Cards) can map their internal memory directly into the application's virtual address space. Writing to a specific memory address then updates the screen pixels directly.

7. Troubleshooting and Tools

Tool	Focus	Key Insight
`iostat`	System-level	Disk throughput and latency
`iotop`	Process-level	Which app is "hogging" the disk?
`lsblk`	Structure	Lists all block devices and partitions
`dmesg`	Kernel-level	Device driver errors and hardware logs
`tcpdump`	Network I/O	Captures raw packets at the kernel level

8. Summary Checklist

Explain the role of the IOMMU in modern systems.
What is the difference between a character device and a block device?
Why do modern kernels split interrupt handling into Top and Bottom halves?
How does io_uring reduce the overhead of high-frequency I/O?
Trace a data packet from a network wire to a user-space application.

End of Chapter 06. Continue to Chapter 07: Storage System.

1. Hardware Communication Primitives​

1.1 Memory-Mapped I/O (MMIO)​

1.2 Interrupts: The Asynchronous Signal​

1.3 Direct Memory Access (DMA) and IOMMU​

2. Kernel I/O Architecture​

2.1 The Device Driver Model​

2.2 Top Halves and Bottom Halves (Linux)​

3. The I/O Stack and Scheduling​

3.1 Block I/O Layer​

3.2 I/O Schedulers​

4. Performance: Buffering and Caching​

4.1 Double Buffering​

4.2 Zero-Copy (Performance Optimization)​

5. Modern High-Performance I/O​

5.1 I/O Multiplexing: epoll​

5.2 io_uring: The New Frontier​

6. Device Abstraction: ioctl and mmap​

6.1 ioctl (Input/Output Control)​

6.2 mmap for Devices​

7. Troubleshooting and Tools​

8. Summary Checklist​