Storage System: From Magnetic Platters to NVMe Fabrics

The Storage System manages the persistence of data across diverse physical media. Over the last decade, storage has undergone a revolution, shifting from mechanical rotating disks to semiconductor-based flash and now to high-speed memory-express fabrics.

This chapter explores the physical characteristics of storage media, the algorithms used to optimize access, and the architectural patterns for redundancy and scaling.

1. Hard Disk Drives (HDD): The Mechanical Legacy

Despite the rise of SSDs, HDDs remain the backbone of mass storage (Cloud cold storage, backup systems).

1.1 Physical Structure

An HDD is a high-precision mechanical device:

Platters: Magnetic-coated disks spinning at 5400 to 15000 RPM.
Arm and Read/Write Head: Moves across the spinning platters to read/write data.
Sectors and Blocks: Historically 512 bytes, now mostly 4KB (Advanced Format).

1.2 Performance Metrics (The Mechanical Bottleneck)

Seek Time: The time to move the arm to the correct track.
Rotational Latency: The time for the data to spin under the head (Average is half a rotation).
Transfer Rate: The speed at which data is moved from the magnetic surface to the controller.
Rule of Thumb: HDDs are excellent for Sequential I/O but abysmal for Random I/O due to mechanical movement.

2. Solid-State Drives (SSD): The Flash Revolution

SSDs use NAND flash memory. Unlike HDDs, they have no moving parts, but they introduce a complex management layer.

2.1 The "Erase-Before-Write" Problem

Flash memory can be read and written at the Page level (e.g., 8KB) but can only be erased at the Block level (e.g., 256 pages).

Result: You cannot overwrite a page directly. You must erase the whole block first.

2.2 The Flash Translation Layer (FTL)

A specialized controller inside the SSD performs several critical tasks:

LBA Mapping: Maps logical addresses from the OS to physical flash locations.
Wear Leveling: Ensures that all flash cells wear out at the same rate by moving frequently written data to fresh cells.
Garbage Collection (GC): Periodically moves valid pages from a fragmented block to a new one so the old block can be erased.
Over-provisioning: SSDs reserve extra capacity (e.g., 10%) specifically for GC and wear leveling.

3. NVMe: Optimized for Parallelism

Old storage protocols (SATA/SAS) were designed for slow HDDs. NVMe (Non-Volatile Memory Express) was built specifically for the massive parallelism of flash.

Queue Depth: SATA supports 1 queue with 32 commands. NVMe supports 64,000 queues with 64,000 commands each.
Parallelism: Multiple CPU cores can submit I/O requests simultaneously without locking each other out.
Direct Link: NVMe connects directly to the PCIe bus, bypassing the slow legacy storage controllers.

4. RAID: Redundancy and Performance

RAID (Redundant Array of Independent Disks) combines multiple disks into a single logical volume.

4.1 Common RAID Levels

Level	Name	Performance	Fault Tolerance	Best Use Case
0	Striping	Excellent (Read/Write)	None	Temp data, Scratch space
1	Mirroring	Good (Read), Fair (Write)	1 Disk	OS Boot drives
5	Parity	Fair (Write penalty)	1 Disk	General storage
6	Double Parity	Fair (High write penalty)	2 Disks	Large data arrays
10	Stripe of Mirrors	Excellent	High (can survive 1-N)	Databases

4.2 Software RAID vs. Hardware RAID

Hardware RAID: Uses a dedicated controller card with its own RAM and battery backup. It is invisible to the OS.
Software RAID (Linux mdadm): The CPU handles the RAID logic. It is more flexible and cheaper but uses CPU cycles.

5. Logical Volume Management (LVM)

LVM adds a layer of abstraction between the physical disk and the file system.

5.1 The LVM Hierarchy

Physical Volumes (PV): The raw disk partitions.
Volume Groups (VG): A pool of multiple PVs.
Logical Volumes (LV): The "virtual partitions" created from the pool.

Benefit: You can resize an LV or add a new disk to the VG without rebooting or reformatting.

6. Advanced Storage Concepts

6.1 Storage Area Networks (SAN) vs. Network Attached Storage (NAS)

NAS: File-level access (e.g., NFS, SMB). It's like a remote folder.
SAN: Block-level access (e.g., iSCSI, Fibre Channel). It's like a remote hard drive.

6.2 Object Storage (The Cloud Standard)

Systems like Amazon S3 or Ceph don't use files or directories.

Key-Value Access: You store an object (Data + Metadata) and retrieve it via a unique ID.
Scalability: Can grow to exabytes across thousands of machines.
Consistency: Usually provides "Eventual Consistency" or "Strong Consistency" depending on the implementation.

7. Storage Monitoring and Benchmarking

Tool	Focus	Use Case
`fio`	Benchmarking	The industry standard for stress-testing I/O
`smartctl`	Health	Reads internal disk health (S.M.A.R.T.)
`pvs / vgs / lvs`	Management	Inspects LVM structures
`nvme-cli`	Diagnostics	In-depth info for NVMe drives
`iostat -x 1`	Latency	Real-time disk utilization and wait times

8. Summary Checklist

Why is random I/O slower on an HDD but fast on an SSD?
Explain the role of "Garbage Collection" in an SSD.
What is the "Write Hole" problem in RAID 5/6?
How does LVM allow for online partition resizing?
SAN vs NAS: which protocol gives you a block device?

End of Chapter 07. Continue to Chapter 08: Security & Protection.

1. Hard Disk Drives (HDD): The Mechanical Legacy​

1.1 Physical Structure​

1.2 Performance Metrics (The Mechanical Bottleneck)​

2. Solid-State Drives (SSD): The Flash Revolution​

2.1 The "Erase-Before-Write" Problem​

2.2 The Flash Translation Layer (FTL)​

3. NVMe: Optimized for Parallelism​

4. RAID: Redundancy and Performance​

4.1 Common RAID Levels​

4.2 Software RAID vs. Hardware RAID​

5. Logical Volume Management (LVM)​

5.1 The LVM Hierarchy​

6. Advanced Storage Concepts​

6.1 Storage Area Networks (SAN) vs. Network Attached Storage (NAS)​

6.2 Object Storage (The Cloud Standard)​

7. Storage Monitoring and Benchmarking​

8. Summary Checklist​