I/O System: From Device Drivers to Asynchronous Streams
The I/O System is the bridge between the high-speed, structured world of the CPU/RAM and the chaotic, asynchronous world of external hardware. Managing devices ranging from ultra-fast NVMe drives to slow human-interface devices requires a flexible and robust kernel architecture.
This chapter explores how the kernel manages device communication, optimizes data throughput, and provides a unified interface for diverse hardware.
1. Hardware Communication Primitives
The CPU uses several low-level mechanisms to talk to hardware controllers.
1.1 Memory-Mapped I/O (MMIO)
Modern CPUs reserve a portion of the physical address space for hardware devices.
- The Concept: Writing to address
0xFFFF0000might not write to RAM, but instead sends a command to a network card. - Benefit: The CPU can use standard load/store instructions to interact with hardware.
1.2 Interrupts: The Asynchronous Signal
Instead of the CPU constantly checking (polling) if a device is ready, the device signals the CPU.
- Interrupt Vector: An index into the Interrupt Descriptor Table (IDT).
- Masking: The CPU can temporarily disable interrupts to protect critical kernel code.
1.3 Direct Memory Access (DMA) and IOMMU
Transferring megabytes of data through the CPU is a waste of cycles.
- DMA Controller: A specialized hardware unit that can move data directly from a device (e.g., Disk) to RAM without CPU intervention.
- IOMMU: Similar to an MMU but for I/O devices. It ensures that a device can only access the memory regions explicitly assigned to it by the kernel, providing hardware-level protection.
2. Kernel I/O Architecture
2.1 The Device Driver Model
A Device Driver is the software component that bridges the gap between the kernel's generic I/O calls and the hardware's specific registers.
- Character Devices: Handle streams of bytes (Keyboards, Mice, Serial Ports).
- Block Devices: Handle data in addressable blocks (HDDs, SSDs, Flash).
- Network Devices: Handle discrete packets.
2.2 Top Halves and Bottom Halves (Linux)
Handling an interrupt must be fast because other interrupts are disabled during the process.
- Top Half (ISR): The immediate response. It acknowledges the interrupt, saves critical data, and schedules the "Bottom Half."
- Bottom Half (Tasklets/Workqueues): Performs the heavy lifting (e.g., parsing a network packet or copying data) asynchronously.
3. The I/O Stack and Scheduling
3.1 Block I/O Layer
When a file system requests a block, it enters the Block Layer.
- Merging: If requests for blocks 10-15 and 16-20 arrive, the kernel merges them into a single 10-block request.
- Sorting: The kernel sorts requests to minimize disk head seek time.
3.2 I/O Schedulers
- Deadline: Prioritizes avoiding starvation by setting a strict expiration time for each request.
- BFQ (Budget Fair Queuing): A complex scheduler that tries to give each process a fair share of disk bandwidth. Excellent for desktop responsiveness.
- Kyber: A modern, lightweight scheduler designed for ultra-fast NVMe devices.
4. Performance: Buffering and Caching
4.1 Double Buffering
While the device is filling one buffer, the application is reading from the second. This hides the latency of the device.
4.2 Zero-Copy (Performance Optimization)
Traditional read() followed by write() to a socket involves 4 context switches and 4 data copies.
sendfile()/splice(): These system calls allow the kernel to transfer data directly from the Page Cache to the Socket Buffer, avoiding any copying into user-space RAM.
5. Modern High-Performance I/O
5.1 I/O Multiplexing: epoll
How does a web server handle 100,000 connections?
select/poll: The server asks the kernel "Which of these 1,000 sockets are ready?" The kernel re-scans all 1,000 every time ().epoll(Linux): The kernel maintains a list of ready sockets. When a socket becomes ready, it is added to a "ready list." The server only processes the ready ones ().
5.2 io_uring: The New Frontier
io_uring is a Linux interface that uses a shared memory ring buffer between user-space and the kernel.
- Submission Queue (SQ): User-space adds I/O requests.
- Completion Queue (CQ): Kernel adds results.
- Result: Zero system calls are required to submit I/O once the rings are set up, leading to massive performance gains for databases and network proxies.
6. Device Abstraction: ioctl and mmap
6.1 ioctl (Input/Output Control)
A "catch-all" system call used for device-specific operations that don't fit into read/write (e.g., ejecting a CD-ROM or setting the speed of a serial port).
6.2 mmap for Devices
High-speed devices (like Video Cards) can map their internal memory directly into the application's virtual address space. Writing to a specific memory address then updates the screen pixels directly.
7. Troubleshooting and Tools
| Tool | Focus | Key Insight |
|---|---|---|
iostat | System-level | Disk throughput and latency |
iotop | Process-level | Which app is "hogging" the disk? |
lsblk | Structure | Lists all block devices and partitions |
dmesg | Kernel-level | Device driver errors and hardware logs |
tcpdump | Network I/O | Captures raw packets at the kernel level |
8. Summary Checklist
- Explain the role of the IOMMU in modern systems.
- What is the difference between a character device and a block device?
- Why do modern kernels split interrupt handling into Top and Bottom halves?
- How does
io_uringreduce the overhead of high-frequency I/O? - Trace a data packet from a network wire to a user-space application.
End of Chapter 06. Continue to Chapter 07: Storage System.