狠狠撸

TLPI - Chapter 14
File Systems
Shu-Yu Fu (shuyufu@gmail.com)

In This Chapter
● The majority of this chapter is concerned with file
systems, which are organized collections of files
and directories. We explain a range of file-system
concepts, sometimes using the traditional Linux
ext2 file system as a specific example. We also
briefly describe some of the journaling file systems
available on Linux.
● We conclude the chapter with a discussion of the
system calls used to mount and unmount a file
system, and the library functions used to obtain
information about mounted file systems.

Device Special Files (Devices)
● A device special file (/dev directory)
corresponds to a device on the system.
● A device driver is a unit of kernel code that
implements a set of operations (open(),
close(), read(), write(), ..., etc.).
○ Character devices handle data on a character-by-
character basis.
○ Block devices handle data a block at a time.
○ # ls -l /dev
drwxr-xr-x 2 root root 1024 Oct 28 18:53 block
crw-r----- 1 root root 252, 0 Oct 25 15:55 cmem
crw-r----- 1 root root 5, 1 Oct 28 18:53 console
drwxr-xr-x 3 root root 1024 Oct 28 18:53 disk
brw-r----- 1 root root 3, 0 Oct 25 15:55 hda
brw-r----- 1 root root 3, 1 Oct 25 15:55 hda1
...

Device Special Files (Devices) (cont.)
● Each device file has a major ID number and
a minor ID number (recorded in the i-node).
○ The major ID identifies the general class of device
○ The minor ID uniquely identifies a particular device
● On Linux 2.4 and earlier, both major and
minor IDs are represented using just 8 bits.
● On Linux 2.6, the major and minor device
IDs using more bits (respectively, 12 and 20
bits).
● mknod and mknod() create a device file
(even FIFO (mkfifo()) and directory (mkdir()).

Disk Drives
● Track { phy. Block { Sector } }
● Modern disks are fast, reading and writing
information on the disk still takes significant
time.
a. move disk head to the appropriate track (seek time)
b. wait until the appropriate sector rotates under the
head (rotational latency)
c. the required blocks must be transferred (transfer
time)
● More
a. 硬碟內外圈的速度
b. Zone bit recording
c. Constant angular velocity

Disk Partitions
● Each disk is divided into one or more
partitions.
● Each partition is treated by the kernel as a
separate device residing under the /dev
directory. A disk partition usually contains
one of the following:
○ a file system
○ a data area
○ a swap area created using the mkswap and use
swapon(2, 8) and swapoff(2, 8) to turn on/off swap
● # cat /proc/partitions
● # cat /proc/swaps

File Systems
● A file system is create using mkfs
command.
● Linux supports a wide variety of file systems.
● # cat /proc/filesystems
● We use ext2 (successor to ext) as an example
at various points later in this chapter

File-system Structure
● The basic unit for allocating space in a file
system is a logical block (of size 1024, 2048,
4096 bytes), which is some multiple of
continuous physical blocks on the disk
device.
● FIBMAP ioctl() operation allows you to
determine the physical location of a specified
block of a file.

File-system Structure (cont.)
● Boot block is not used by the file system.
● Superblock contains parameter information:
○ the size of the i-node table;
○ the size of logical blocks in this file system; and
○ the size of the file system in logical blocks.
● I-node table (also called the i-list): each file
or directory in the file system has a unique
entry in the i-node table.
● Data block is used for the blocks of data that
form the files and directories residing in the
file system.
● ext2 is more complex than the picture.

I-nodes
● I-nodes are identified numerically by their
sequential location in the i-node table.
bobby@bobby-Veriton-M490:/lib$ ls -li
total 2064
147849218 drwxr-xr-x 2 root root 4096 Oct 29 09:10 apparmor
147849240 lrwxrwxrwx 1 root root 21 Jun 4 09:38 cpp -> /etc/alternatives/cpp
147849244 -rw-r--r-- 1 root root 42680 Apr 11 2012 libbrlapi.so.0.5.6

● The information maintained in an i-node
including:
○ File type, owner, group, access permissions for
three categories of user (owner, group, and other),
three timestamps (last access (ls -lu), last
modification (ls -l), and last status change (ls -
lc)), number of hard link, size of the file, number of
blocks actually allocated, and pointers to the data
blocks.

I-nodes and Data Block Pointers in
ext2
● The ext2 doesn't store the data blocks of a
file contiguously and allows the file system to
use space in an efficient way.
● To locate the file data blocks, the kernel
maintains a set of pointers in the i-node.
● One benefit, files can have holes.

The Virtual File System (VFS)
● The virtual file system is a abstraction layer
for file-system operations.
○ The VFS defines a generic interface for file-system
operations.
○ Each file system provides an implementation for the
VFS interfaces.
● Naturally, some file systems don't support all
of the VFS operations.
○ the underlying file system passes an error code back
to the VFS layer indicating the lack of support.

Journaling File Systems
● The ext2 suffers from a classic limitation of
such file system: after a system crash, a file-
system consistency check (fsck) must be
performed (may take several hours) in order
to ensure the integrity of the file system.
● Journaling file systems eliminate the need
for length file-system consistency checks
after a system crash.
○ The most notable disadvantage of journaling is that it
adds time to file updates, though good design can
make this overhead low.
● ext4 and btrfs

Single Directory Hierarchy and
Mount Points
● All files from all file systems reside under a
single directory tree (root, / (slash)).
● Other file systems are mounted under the
root.
○ # mount device directory
○ # umount directory
○ Linux (2.4.19 and later) supports per-process mount
namespaces
■ # cat /proc/self/mounts

Single Directory Hierarchy and
Mount Points
● All files from all file systems reside under a
single directory tree (root, / (slash)).
● Other file systems are mounted under the
root.
○ # mount device directory
■ # cat /proc/mounts
○ # umount directory
○ Linux (2.4.19 and later) supports per-process mount
namespaces
■ # cat /proc/self/mounts

Mounting and Unmounting File
Systems
● The mount() and umount() system calls
allow a process to mount and unmount file
systems.
● The mount and umount commands
automatically maintain the file /etc/mtab
which includes file system-specific options,
but, mount() and umount() don't.
● The /etc/fstab file, maintained by the
administrator, contains descriptions of all of
the available file systems, and is used by the
mount, umount, and fsck commands.

Mounting and Unmounting File
Systems (cont.)
● The /proc/mounts, /etc/mtab, and
/etc/fstab files share a common format
(the getfsent() and getmntent() functions that
can be used to read records from these
files).
● /dev/sda9 /boot ext3 rw 0 0
○ the name of the mounted device
○ the mount point for the device
○ the file-system type
○ mount flags
○ a number used to control the operation of file-system backups by dump(8).
○ A number used to control the order in which fsck(8) checks file systems at system boot
time.

Mounting a File System: mount()
● #include <sys/mount.h>
● int mount (const char * source, const char * target, const char *
fstype, unsigned long mountflags, const void * data);
○ MS_NOATIME
○ MS_NODIRATIME
● mount ("/dev/md0", "/opt/media/volume0", "ext4", MS_NOATIME |
MS_NODIRATIME, NULL);
● The final mount() argument, data, is a pointer to a buffer
of information whose interpretation depends on the file
system.
● Documentation/filesystems

Unmounting a File System: umount()
and umount2()
● #include <sys/mount.h>
● int umount (const char * target);
● int umount2 (const char * target, int flags);
● umount2 allows finer control over the
unmount operation via the flags argument.
○ MNT_LAZY
○ MNT_EXPIRE

Advanced Mount Features
● Mounting a File System at Multiple Mount
Points
○ # mkdir /mnt/a /mnt/b
○ # mount /dev/md0 /mnt/a
○ # mount /dev/md0 /mnt/b
● Stacking Multiple Mounts on the same
Mount Point (chroot()-jailed[*])
○ # mkdir /mnt/a
● Mount Flags that are Per-Mount Options
○ # mount /dev/md0 -o noexec /mnt/b

Advanced Mount Features (cont.)
● Bind Mounts (mount --bind的妙用)
○ # mkdir /mnt/a /mnt/b
○ # touch /mnt/a/x
○ # mount --bind /mnt/a /mnt/b
● Recursive Bind Mounts
○ # mkdir top src1 src2 dir1 dir2
○ # touch src1/aaa src2/bbb
○ # mount --bind src1 top
○ # mkdir top/sub
○ # mount --bind src2 top/sub
○ # mount --bind top dir1
○ # mount --rbind top dir2

A Virtual Memory File System: tmpfs
● Linux supports the notion of virtual file
systems that reside in memory.
● The tmpfs uses not only RAM, but also the
swap space, if RAM is exhausted. By
default, a tmpfs is permitted to grow to half
the size of RAM.
● # mount source target -t tmpfs -o size=1m
● tmpfs also serve two special purposes:
○ System V shared memory and shared anonymous
memory mappings
○ /dev/shm is ued for the glibc implementation of
POSIX shared memory and POSIX semaphores

Obtaining Information About a File
System: statvfs()
● #include <sys/statvfs.h>
● int statvfs (const char * pathname, struct statvfs * statvfsbuf);
● int fstatvfs (int fd, struct statvfs * statvfsbuf);
● struct statvfs {
● unsigned long f_bsize; /* File-system block size (in bytes) */
● unsigned long f_frsize; /* Fundamental file-system block size (in bytes) */
● fsblkcnt_t f_blocks; /* Total number of blocks in file system (in units of
● 'f_frsize') */
● fsblkcnt_t f_bfree; /* Total number of free blocks */
● fsblkcnt_t f_bavail; /* Number of free blocks available to unprivileged
● process */
● fsfilcnt_t f_files; /* Total number of i-nodes */
● fsfilcnt_t f_ffree; /* Total number of free i-nodes */
● fsfilcnt_t f_favail; /* Number of i-nodes available to unprivileged
● process (set to 'f_ffree' on Linux) */
● unsigned long f_fsid; /* File-system ID */
● unsigned long f_flag; /* Mount flags */
● unsigned long f_namemax;/* Maximum length of filenames on this file system */
● }
● The fsblkcnt_t and fsfilcnt_t data types are integer types.
● For most file Linux systems, the values of f_bsize and f_frsize are the same. On file systems support the notion
of block fragments. f_frsize is the size of a fragment, and f_bsize is the size of a whole block.
● If there are reserved blocks in the file system, then the difference in values of the f_bfree and f_bavail tells us
how many blocks are reserved.
● The f_flag field is a bit mask of the flags used to mount the file system. However, the constants have names
starting with ST_ instead of the MS_.
● The f_fsid is used on some UNIX implementations to return a unique identifier for the file system. For most Linux
file systems, this field contains 0.

狠狠撸

TLPI Chapter 14 File Systems

More Related Content

TLPI Chapter 14 File Systems