[TOC]

文件系统

硬件抽象

第一层抽象：从一块完整磁盘到分区，每个分区可视为一块磁盘

MBR

MBR (主引导记录) 位于硬盘的 0 磁道、0 柱面、1 扇区中，主要记录了启动引导程序和磁盘的分区表。

enter image description here

第二层抽象：每个分区切成等份的小块(512Byte),从0开始编号

第三层抽象: 对编号进行区域划分（文件系统决定的）,从这一步开始就是不同文件系统发挥了

Minix1.0 文件系统

minix1.0版本

Ext2(Linux)

superblock + inode table + data area
enter image description here

superblock：文件系统本身结构信息 ( file system type 、the size of each area )
inode table：单个文件的属性 (size、owner 、last modified time 、sequence of data area blocks)
data area：data content

FAT, NTFS(Windows)

存储抽象:文件

理解 inode

This is important：Each inode is identified by its position in the inode table. For example， inode 2 is the third struct in the inode table of the filesystem.

总结: Linux文件 = 文件系统类型 + inode (存储属性) + data(存储实际内容)

理解文件存储

enter image description here

文件名是否存在 inode 结构体上？

文件名并不存储于 inode 结构体上。
Files are entries in the inode table with contents stored in the data region.

理解大文件存储

每个inode结构的大小是固定的，它是如何做到存储有大量 data block 的文件的？

参考1: 三级间接存储法

管理抽象:目录

目录是一种特殊的文件，它的内容是其他文件和目录的列表
the names in a directory refer to files and to other directories

”文件在目录中“ 如何理解?

Directories contain references to files. Each of these references is called link (contains a i-node number and a filename). it is a entry to the real file.

It means that parent directory cantains a link to the inode for that subdirectory.

”.“ 如何理解？

Every directory has an link . for its own inode. the kernel installs it when create a directory.

“..” 如何理解?

Dotdot is a reserved link to subdirectory’s parent directory. the kernel installs it when create a directory. Root / directory‘s dotdot entry is a references to itself.

Mutilple links , Harr link(硬连接), link counts

If a file has 2 links, which one is the original file and which is the link to it?
In the Unix directory structure , these two links are same. they are called hard links to the file.
The file is an inode and a bunch of data blocks; A link is a reference to an inode.
The kernel records the number of links which is called link count to a file in file’s inode struct.

Filename

In the Unix file System , files actually do not have names. Files only have inode number. Links have names.

Create File

Store Properties 属性
Store Data
Record Allocation 记录分配的块到属性
Add Filename to Directory

Read File

$ cat userlist

Search directory . find filename
get the i-node number and locate to it
get a list of data blocks’s number and check permission
read the data blocks one by one

directory system call

mkdir()  // create a directory
rmdir()  // delete a empty directory
unlink() // delete a file entry and update the link count
link()   // add a file entry and update the link count
rename() // update file entry name, mv command uses this
chdir()  // change the current directory of a process

Each running process on Unix has a current directory. Internally , the process keeps a variable that stores the inode number of the current directory. When you say “change into a new directory” , it means you just change the value of that variable.

pwd

$ pwd                                                                
/home/link/workspace/md

Where is that long path /home/link/workspace/md stored ?

How does pwd know the current directory is called md ?

How does pwd know the parent of md is called workspace ?

the answer is simple : Follow the links and read the directories . and .. , pwd climbs up the tree , directory by directory. noting each step’s filename until it reachs the top of the tree.

How do we know when we reach the top of tree?

In the root directory of a Unix file system. . and .. point to the same inode.

spwd stops when it reaches the root of the file system.The root of this filesystem. but not the root of the entire tree on this computer.

文件系统抽象:文件树

Root Filesystem

In Unix , every file on the system is located somewhere in a single tree directories. There are no separate drivers or volumes. In fact, directories on physically separate disks and partitions are seamlessly subsumed (无缝融入) into this single root tree (单目录树/).
so we only think about directories and files when writing programs. no partitions and volumes.

The user sees a seamless(无缝) tree of directories.In reality there are two trees，one on disk 1 and one on disk 2.Each tree has a root directory.One filesystem is designated the root filesystem：the top of this tree really is the root of the entire tree.The other filesystem is attached to some subdirectory of the root filesystem. Internally，the kernel associates a pointer to the other filesystem with a directory on the root filesystem.

mount

Unix uses the phrase to mount a filesystem in the sense of to mount a butterfly or to mount a picture that is，to pin it to some existing support.The root directory of the subtree is pinned on to a directory on the root filesystem.The directory to which the subtree is attached is called the mount point for that second system.

The filesystem on /dev/hda6 is attached to the root filesystem at the /home directory. Thus，when a user chdir from / to /home ， she crosses from one filesystem to another. When our spwd winds its way up the tree，it stops at /home because it has reached the top of its filesystem.

mount 这里的细节还没有讲到，要自己查资料补充。

Symbolic links

硬连接可以跨文件系统么?

不能，硬连接的本质是同一个文件的inode number 被写入到了多个目录中，拥有多个filename。如下图，有相同 inode number 402 的两个文件分别位于两个文件系统中，而不是指向同一个文件。当然 link() rename() 系统调用也禁止了这种调用。

目录有硬连接么？

目录也是文件，从实现上是完全可以的，比如 .. 就是父目录的硬连接。但 Linux 保留了这个特例，而不允许创建普通目录的硬连接。
原因是：Linux 的文件系统就是按照树形结构设计的，如果加入了目录的硬连接，会使树变成图结构，由于硬连接都是相互等价的，除filename外属性完全一致，根本无法区分出谁是本体。所以，在遍历一个目录时，会出现无限递归。当然可以通过记录下每一个遍历过的节点，避免无限递归。但为了支持这么一个小小的特性而将设计复杂度和操作复杂度，是得不偿失的。
PS: Linux的目录结构虽然经常被称为目录树，但其实Linux的目录结构更应该算是一个不带回环的图结构 DAG (Directed acyclic graph) 有向无环图。

符号连接

符号连接是目录连接实现的替代方案。

the symlink is a file with its own inode :

the flag file-type in the "inode" that tells to the system this file is a "symbolic link"
file-content : path to the target - in other words: a symbolic link is simply a file which contains a filename with a flag in the inode.

符号连接不会造成路径的回环的本质原因是：符号连接本身是另外一个文件(独立的inode)，在遍历的时候可以单独处理（比如当成普通文件，设置最大递归数等).

注意符号连接的回环处理

If the component is found and is a symbolic link (symlink), we first resolve this symbolic link (with the current lookup directory as starting lookup directory).

Upon error, that error is returned.
If the result is not a directory, an ENOTDIR error is returned.
If the resolution of the symlink is successful and returns a directory, we set the current lookup directory to that directory, and go to the next component.
Note that the resolution process here involves[包含] recursion.In order to protect the kernel against stack overflow, and also to protect against denial of service, there are limits on the maximum recursion depth, and on the maximum number of symbolic links followed. An ELOOP error is returned when the maximum is exceeded ("Too many levels of symbolic links").

注意！当符号连接是链接到当前目录或者当前目录的父目录的时候，如果程序没有设置最大递归数，就会死循环，让进程奔溃！

symlink(char *actualpath, char *sympath)  // 创建一个符号连接
readlink() // 获取一原始文件的名字
stat()　　  // 返回文件的信息，如果是符号连接，则自动解开，返回的是被引用文件的信息
lstat()    // 返回该符号链接的本身的有关信息，而不是由该符号链接引用文件的信息

文件系统的实现(下)

回顾和展望

以上都是基于Linux经典文件系统Ext2的设计作为参考进行的分享
更多一个分享可以从Ext2的进化，以及和其他的文件系统的设计区别这个方向