How hard disks work

Disk sector: http://en.wikipedia.org/wiki/Disk_sector

Disk seek: http://en.wikipedia.org/wiki/Disk_seek

Block: http://en.wikipedia.org/wiki/Block_(data_storage)

Internal fragmentation: http://en.wikipedia.org/wiki/Internal_fragmentation

How the hard disks work: http://www.youtube.com/watch?v=zp-VBlLuDic

Partitioning: http://en.wikipedia.org/wiki/Hard_disk_drive_partitioning
Filesystem types

* Physical filesystems
1. ext, ext2, ext3
2. resiserfs
3. xfs
4. tmpfs

Physical filesystems are those which are working with the local physical disks or memory.

* Network filesystems
1. NFS
2. CIFS

Network filesystems are the ones that are working with remote physical disks or memory but represent the information as local.

* Psuedo filesystems
1. ProcFS
2. SysFS (Wikipedia

These filesystems are not working with real files but only with information which is represented as files. Such filesystems doesn't follow the rules that apply to any other filesystem type.

* Cluster filesystems
1. OCFS2
2. GFS, GFS2
3. Gfarm

Cluster filesystems are used for interconnectivity between one or more machines for providing failover or High Performance Computing(HPC) to the systems.

* Filesystems in User Space (FUSE)
1. EncFS
2. BluetoothFS
3. CvsFS

How the filesystems work in the kernel

* Kernel model:

 
User Proces
|
System call
|
System call interface
|
VFS Layer
|
/--------------------------........
| | | |
ext2 ext3 xfs reiserfs ........
| | | |
\--------------------------........
|
buffer cache
|
device driver
|
disk cotroler


1. Basicly when you request a file with a system command(for example cat /etc/motd), the command you executed is issuing a system call(fopen) to the system call interface.
2. The system call interface is connecting to the VFS Layer functions and asks it for the file /etc/motd located on diskX.
3. Then the VFS Layer based on it's internal table of currently loaded and mounted filesystems decides which filesystem functions should be executed to deliver this information.
4. After the FS functions are choosed they request from the buffer cache functions the data for this file.
5. In turn the buffer cache functions request from the device driver this information.
6. The device driver then asks the Disk Controler(note that you never access the disk, you always access the disk controler) for this peace of information located based on the information provided by the FS functions.

Filesystems in depth
Ext2

* Introduced 1993
* Replacement for the Extended file system
* Developed by Rémy Card (the developer of the extended file system)
* Kernels < 2.6 and all 32bit systems are limited to 2Tbytes
* Kernels >= 2.6 are limited to 64TBytes
* 2.4 Kernels have backported patches from the 2.6
* Standard functions
1. Regular files
2. Directories
3. Device files
4. Symbolic links
5. File permissions
6. Long file names(255 chars configurable to 1012)
7. Reserved 5% of the blocks for the root user
* Advanced features
1. Different block sizes (1024, 2048 and 4096 bytes)
2. File attributes
3. Hard links
4. Fast symbolic links(max 60 chars)
5. New attributes added to the super block
1. Mount count and maximum mount count
2. FS check interval and maximum fs check interval
3. Three options for the error behavior
1. Continue normal operation
2. Remount the filesystem read-only
3. Generate kernel panic in order to reboot and force FS check
4. Mount options are presented to change the above error behavior
6. New userland tools are introduced to help manage the new attributes to the super block(tune2fs)
* The physical structure of the Ext2 Filesystem

The filesystem is made of block groups in the following order:

| BOOT | BLOCK | BLOCK | BLOCK | ... | BLOCK |
| SECTOR | GROUP 1 | GROUP 2 | GROUP 3 | ... | GROUP N |

In each block group there is a redundand copy of the superblock and filesystem descriptors, and also part of the filesystem (a block bitmap, an inode bitmap, a piece of the inode table, and data blocks).

The structure of a block group is represented in this table:

| Super | Filesystem | Block | Inode | Data |
| Block | Descriptors | Bitmap | Bitmap | Blocks |

In Ext2fs, directories are managed as linked lists of variable length entries. Each entry contains the inode number, the entry length, the file name and its length. By using variable length entries, it is possible to implement long file names without wasting disk space in directories. The structure of a directory entry is shown in this table:

| INODE NUMBER | ENTRY LENGHT | NAME LENGHT | FILENAME |

As an example, The next table represents the structure of a directory containing three files: file1, long_file_name, and f2:

| i1 | 16 | 05 | file1 |
| i2 | 40 | 14 | long_file_name |
| i3 | 12 | 02 | f2 |

* Performance optimizations

The Ext2fs kernel code contains many performance optimizations, which tend to improve I/O speed when reading and writing files.

Ext2fs takes advantage of the buffer cache management by performing readaheads: when a block has to be read, the kernel code requests the I/O on several contiguous blocks. This way, it tries to ensure that the next block to read will already be loaded into the buffer cache. Readaheads are normally performed during sequential reads on files and Ext2fs extends them to directory reads, either explicit reads (readdir(2) calls) or implicit ones (namei kernel directory lookup).

Ext2fs also contains many allocation optimizations. Block groups are used to cluster together related inodes and data: the kernel code always tries to allocate data blocks for a file in the same group as its inode. This is intended to reduce the disk head seeks made when the kernel reads an inode and its data blocks.

When writing data to a file, Ext2fs preallocates up to 8 adjacent blocks when allocating a new block. Preallocation hit rates are around 75% even on very full filesystems. This preallocation achieves good write performances under heavy load. It also allows contiguous blocks to be allocated to files, thus it speeds up the future sequential reads.

These two allocation optimizations produce a very good locality of:

1. related files through block groups
2. related blocks through the 8 bits clustering of block allocations.

References:

1. Design and Implementation of the Second Extended Filesystem

Ext3

References:

1. http://en.wikipedia.org/wiki/Ext3
2. http://en.wikipedia.org/wiki/Journaling_file_system

[edit] ReiserFS
[edit] XFS
[edit] TmpFS
[edit] ProcFS
[edit] SysFS
[edit] Other useful stuff

http://en.wikipedia.org/wiki/Comparison_of_file_systems

Последна модификация: Tuesday, 16 October 2007, 02:23 PM