Storage System: From Magnetic Platters to NVMe Fabrics
The Storage System manages the persistence of data across diverse physical media. Over the last decade, storage has undergone a revolution, shifting from mechanical rotating disks to semiconductor-based flash and now to high-speed memory-express fabrics.
This chapter explores the physical characteristics of storage media, the algorithms used to optimize access, and the architectural patterns for redundancy and scaling.
1. Hard Disk Drives (HDD): The Mechanical Legacy
Despite the rise of SSDs, HDDs remain the backbone of mass storage (Cloud cold storage, backup systems).
1.1 Physical Structure
An HDD is a high-precision mechanical device:
- Platters: Magnetic-coated disks spinning at 5400 to 15000 RPM.
- Arm and Read/Write Head: Moves across the spinning platters to read/write data.
- Sectors and Blocks: Historically 512 bytes, now mostly 4KB (Advanced Format).
1.2 Performance Metrics (The Mechanical Bottleneck)
- Seek Time: The time to move the arm to the correct track.
- Rotational Latency: The time for the data to spin under the head (Average is half a rotation).
- Transfer Rate: The speed at which data is moved from the magnetic surface to the controller.
- Rule of Thumb: HDDs are excellent for Sequential I/O but abysmal for Random I/O due to mechanical movement.
2. Solid-State Drives (SSD): The Flash Revolution
SSDs use NAND flash memory. Unlike HDDs, they have no moving parts, but they introduce a complex management layer.
2.1 The "Erase-Before-Write" Problem
Flash memory can be read and written at the Page level (e.g., 8KB) but can only be erased at the Block level (e.g., 256 pages).
- Result: You cannot overwrite a page directly. You must erase the whole block first.
2.2 The Flash Translation Layer (FTL)
A specialized controller inside the SSD performs several critical tasks:
- LBA Mapping: Maps logical addresses from the OS to physical flash locations.
- Wear Leveling: Ensures that all flash cells wear out at the same rate by moving frequently written data to fresh cells.
- Garbage Collection (GC): Periodically moves valid pages from a fragmented block to a new one so the old block can be erased.
- Over-provisioning: SSDs reserve extra capacity (e.g., 10%) specifically for GC and wear leveling.
3. NVMe: Optimized for Parallelism
Old storage protocols (SATA/SAS) were designed for slow HDDs. NVMe (Non-Volatile Memory Express) was built specifically for the massive parallelism of flash.
- Queue Depth: SATA supports 1 queue with 32 commands. NVMe supports 64,000 queues with 64,000 commands each.
- Parallelism: Multiple CPU cores can submit I/O requests simultaneously without locking each other out.
- Direct Link: NVMe connects directly to the PCIe bus, bypassing the slow legacy storage controllers.
4. RAID: Redundancy and Performance
RAID (Redundant Array of Independent Disks) combines multiple disks into a single logical volume.
4.1 Common RAID Levels
| Level | Name | Performance | Fault Tolerance | Best Use Case |
|---|---|---|---|---|
| 0 | Striping | Excellent (Read/Write) | None | Temp data, Scratch space |
| 1 | Mirroring | Good (Read), Fair (Write) | 1 Disk | OS Boot drives |
| 5 | Parity | Fair (Write penalty) | 1 Disk | General storage |
| 6 | Double Parity | Fair (High write penalty) | 2 Disks | Large data arrays |
| 10 | Stripe of Mirrors | Excellent | High (can survive 1-N) | Databases |
4.2 Software RAID vs. Hardware RAID
- Hardware RAID: Uses a dedicated controller card with its own RAM and battery backup. It is invisible to the OS.
- Software RAID (Linux
mdadm): The CPU handles the RAID logic. It is more flexible and cheaper but uses CPU cycles.
5. Logical Volume Management (LVM)
LVM adds a layer of abstraction between the physical disk and the file system.
5.1 The LVM Hierarchy
- Physical Volumes (PV): The raw disk partitions.
- Volume Groups (VG): A pool of multiple PVs.
- Logical Volumes (LV): The "virtual partitions" created from the pool.
- Benefit: You can resize an LV or add a new disk to the VG without rebooting or reformatting.
6. Advanced Storage Concepts
6.1 Storage Area Networks (SAN) vs. Network Attached Storage (NAS)
- NAS: File-level access (e.g., NFS, SMB). It's like a remote folder.
- SAN: Block-level access (e.g., iSCSI, Fibre Channel). It's like a remote hard drive.
6.2 Object Storage (The Cloud Standard)
Systems like Amazon S3 or Ceph don't use files or directories.
- Key-Value Access: You store an object (Data + Metadata) and retrieve it via a unique ID.
- Scalability: Can grow to exabytes across thousands of machines.
- Consistency: Usually provides "Eventual Consistency" or "Strong Consistency" depending on the implementation.
7. Storage Monitoring and Benchmarking
| Tool | Focus | Use Case |
|---|---|---|
fio | Benchmarking | The industry standard for stress-testing I/O |
smartctl | Health | Reads internal disk health (S.M.A.R.T.) |
pvs / vgs / lvs | Management | Inspects LVM structures |
nvme-cli | Diagnostics | In-depth info for NVMe drives |
iostat -x 1 | Latency | Real-time disk utilization and wait times |
8. Summary Checklist
- Why is random I/O slower on an HDD but fast on an SSD?
- Explain the role of "Garbage Collection" in an SSD.
- What is the "Write Hole" problem in RAID 5/6?
- How does LVM allow for online partition resizing?
- SAN vs NAS: which protocol gives you a block device?
End of Chapter 07. Continue to Chapter 08: Security & Protection.