Skip to main content

File System

The file system provides a logical way for users and applications to store, organize, and access data on persistent storage.

Files and Directories

  • File: A logical unit of storage, representing a sequence of bytes.
  • Directory: A special file containing a list of filenames and their corresponding metadata (or pointers).
  • File Descriptor (FD): An integer index into an OS-managed table of open files. When a process opens a file, the kernel returns an FD to it.

File System Implementation (Linux/ext4 style)

Most modern file systems are built on three primary data structures:

Inode (Index Node)

The core metadata structure for a file. It contains:

  • File type (regular, directory, etc.)
  • Permissions (Owner, Group, Others)
  • File size
  • Creation, Access, and Modification timestamps
  • Data block pointers (Direct, Indirect, Double Indirect)

Note: The inode does not store the filename; filenames are stored in directories.

Superblock

Contains metadata about the entire file system, such as:

  • Total number of blocks and inodes
  • Number of free blocks and inodes
  • Block size (e.g., 4 KB)
  • Mount status

Data Block

The actual storage area where the file's content is kept.

Reliability and Journaling

A major challenge for file systems is maintaining consistency after a system crash (e.g., during a write operation).

  • Journaling: A technique where the FS records every metadata or data change in a dedicated area (the journal) before actually writing it to the main FS area. If a crash occurs, the FS can simply replay the journal to restore consistency.

Common File Systems

  • ext4 (Linux): A stable, high-performance journaling file system.
  • XFS (Linux): Scalable, high-performance FS used by default in many enterprise distributions (e.g., RHEL).
  • NTFS (Windows): Proprietary journaling FS with advanced security (ACLs) and compression.
  • APFS (macOS): Optimized for SSDs, supporting snapshots and space sharing.
  • ZFS: Advanced FS with features like copy-on-write, snapshots, and data integrity verification (checksums).

Performance Optimization

  • Page Cache: The OS uses free RAM to cache frequently accessed disk blocks.
  • Read-ahead: The kernel predicts future reads by reading ahead a few blocks into memory.
  • Write-back Caching: Writes are cached in memory and flushed to disk periodically to improve responsiveness.