Skip to main content

Storage System: From Magnetic Platters to NVMe Fabrics

The Storage System manages the persistence of data across diverse physical media. Over the last decade, storage has undergone a revolution, shifting from mechanical rotating disks to semiconductor-based flash and now to high-speed memory-express fabrics.

This chapter explores the physical characteristics of storage media, the algorithms used to optimize access, and the architectural patterns for redundancy and scaling.


1. Hard Disk Drives (HDD): The Mechanical Legacy

Despite the rise of SSDs, HDDs remain the backbone of mass storage (Cloud cold storage, backup systems).

1.1 Physical Structure

An HDD is a high-precision mechanical device:

  • Platters: Magnetic-coated disks spinning at 5400 to 15000 RPM.
  • Arm and Read/Write Head: Moves across the spinning platters to read/write data.
  • Sectors and Blocks: Historically 512 bytes, now mostly 4KB (Advanced Format).

1.2 Performance Metrics (The Mechanical Bottleneck)

  • Seek Time: The time to move the arm to the correct track.
  • Rotational Latency: The time for the data to spin under the head (Average is half a rotation).
  • Transfer Rate: The speed at which data is moved from the magnetic surface to the controller.
  • Rule of Thumb: HDDs are excellent for Sequential I/O but abysmal for Random I/O due to mechanical movement.

2. Solid-State Drives (SSD): The Flash Revolution

SSDs use NAND flash memory. Unlike HDDs, they have no moving parts, but they introduce a complex management layer.

2.1 The "Erase-Before-Write" Problem

Flash memory can be read and written at the Page level (e.g., 8KB) but can only be erased at the Block level (e.g., 256 pages).

  • Result: You cannot overwrite a page directly. You must erase the whole block first.

2.2 The Flash Translation Layer (FTL)

A specialized controller inside the SSD performs several critical tasks:

  1. LBA Mapping: Maps logical addresses from the OS to physical flash locations.
  2. Wear Leveling: Ensures that all flash cells wear out at the same rate by moving frequently written data to fresh cells.
  3. Garbage Collection (GC): Periodically moves valid pages from a fragmented block to a new one so the old block can be erased.
  4. Over-provisioning: SSDs reserve extra capacity (e.g., 10%) specifically for GC and wear leveling.

3. NVMe: Optimized for Parallelism

Old storage protocols (SATA/SAS) were designed for slow HDDs. NVMe (Non-Volatile Memory Express) was built specifically for the massive parallelism of flash.

  • Queue Depth: SATA supports 1 queue with 32 commands. NVMe supports 64,000 queues with 64,000 commands each.
  • Parallelism: Multiple CPU cores can submit I/O requests simultaneously without locking each other out.
  • Direct Link: NVMe connects directly to the PCIe bus, bypassing the slow legacy storage controllers.

4. RAID: Redundancy and Performance

RAID (Redundant Array of Independent Disks) combines multiple disks into a single logical volume.

4.1 Common RAID Levels

LevelNamePerformanceFault ToleranceBest Use Case
0StripingExcellent (Read/Write)NoneTemp data, Scratch space
1MirroringGood (Read), Fair (Write)1 DiskOS Boot drives
5ParityFair (Write penalty)1 DiskGeneral storage
6Double ParityFair (High write penalty)2 DisksLarge data arrays
10Stripe of MirrorsExcellentHigh (can survive 1-N)Databases

4.2 Software RAID vs. Hardware RAID

  • Hardware RAID: Uses a dedicated controller card with its own RAM and battery backup. It is invisible to the OS.
  • Software RAID (Linux mdadm): The CPU handles the RAID logic. It is more flexible and cheaper but uses CPU cycles.

5. Logical Volume Management (LVM)

LVM adds a layer of abstraction between the physical disk and the file system.

5.1 The LVM Hierarchy

  1. Physical Volumes (PV): The raw disk partitions.
  2. Volume Groups (VG): A pool of multiple PVs.
  3. Logical Volumes (LV): The "virtual partitions" created from the pool.
  • Benefit: You can resize an LV or add a new disk to the VG without rebooting or reformatting.

6. Advanced Storage Concepts

6.1 Storage Area Networks (SAN) vs. Network Attached Storage (NAS)

  • NAS: File-level access (e.g., NFS, SMB). It's like a remote folder.
  • SAN: Block-level access (e.g., iSCSI, Fibre Channel). It's like a remote hard drive.

6.2 Object Storage (The Cloud Standard)

Systems like Amazon S3 or Ceph don't use files or directories.

  • Key-Value Access: You store an object (Data + Metadata) and retrieve it via a unique ID.
  • Scalability: Can grow to exabytes across thousands of machines.
  • Consistency: Usually provides "Eventual Consistency" or "Strong Consistency" depending on the implementation.

7. Storage Monitoring and Benchmarking

ToolFocusUse Case
fioBenchmarkingThe industry standard for stress-testing I/O
smartctlHealthReads internal disk health (S.M.A.R.T.)
pvs / vgs / lvsManagementInspects LVM structures
nvme-cliDiagnosticsIn-depth info for NVMe drives
iostat -x 1LatencyReal-time disk utilization and wait times

8. Summary Checklist

  • Why is random I/O slower on an HDD but fast on an SSD?
  • Explain the role of "Garbage Collection" in an SSD.
  • What is the "Write Hole" problem in RAID 5/6?
  • How does LVM allow for online partition resizing?
  • SAN vs NAS: which protocol gives you a block device?

End of Chapter 07. Continue to Chapter 08: Security & Protection.