Many embedded devices such as smartphones and routers have a “factory reset” function. According to the “backup” practice that everyone is used to on PCs, it seems necessary to back up the entire system at the factory in read-only ROM. If this is the case, every time you restore the factory settings, the content in the ROM has to be copied to the Flash storage, wasting a lot of storage space, and restoring the factory settings takes a long time. But in fact, restoring the factory settings is just a restart, and the Flash storage in the newly restored system is basically empty.

(Thanks to BW’s comment, the factory reset of the Android system does not use differential technology, but simply clears the data partition. The modification of the /system partition has not been restored, I got it wrong)

CaptureCapture

Virtual Machine Snapshot

In the pure software world, where is a similar “factory reset” function needed? Those who have played with virtual machines may know the “snapshot” function, which can be built in a few seconds. Whenever the virtual machine is broken, it can be restored to the “snapshot” state in a few seconds. The secret of disk snapshots lies in “differential disks”. After executing the “snapshot” command, the original virtual disk is frozen and a new virtual differential disk is generated.

When reading a disk block, if this block is already in the differential disk, read it out; otherwise, read this block in the original disk. When writing a disk block, always write it in the differential disk. In this way, the original disk is in a read-only state, and all writes are in the differential disk. When you need to restore to the “snapshot”, you only need to clear the differential disk, so the time required is very short (the actual need for a few seconds is because the memory state needs to be restored).

CaptureCapture

The buddies who love to toss may have made the “snapshot” into a tree structure (as shown in the figure above, does it look like a version control system). The Now arrow represents the current state. Every time you read a block, you need to find the current differential disk, Windows XP - SP3 differential disk, Windows XP - SP2 differential disk, and Windows XP - RTM master disk in turn; write a block, always Write into the current differential disk. Although this looks fun, because you have to check several times, the read performance will drop. Therefore, the “snapshot” of the virtual machine cannot be used as a “backup”.

Differential Disk Used for Factory Reset

The “factory reset” of some embedded devices is implemented in this way. Linux’s Device Mapper can “overlay” one block device on another block device. The underlying block device is a compressed read-only file system (squashfs) that supports random access (note that not all compression formats support random access, a simple example, some compressed files even if you only want to decompress one of them, you have to decompress many other files), and the upper block device is readable and writable.

figure1figure1

Generally speaking, the read-only file system at the bottom and the readable and writable file system at the top are each a file (similar to a virtual disk), which is mounted during the system startup process to become a file system, and then chroot in. The specific command portal: http://www.ibm.com/developerworks/cn/linux/1306_qinzl_squashfs/

When the user needs to restore the factory settings, just clear the readable and writable file system on the top. If you want to modify the read-only file system at the bottom, you can only “flash” (modify ROM). (Android seems to not use differential disks, just mount the data partition in the /data directory, so formatting the partition will restore the factory settings)

Where the differential disk technology was first used is no longer known. LiveCD should be considered an early application. A read-only CD boots a Linux distribution, and its root directory is a squashfs. The upper layer is ramdisk, which uses memory as a disk. For technical details about ramdisk, Linux enthusiasts can refer to Zhao Lei’s “Writing a Block Device Driver” (or directly look at the ramdisk source code in the kernel), and Windows enthusiasts can refer to Chapter 5 “Virtual Disk” of “Fishing Alone in the Cold River—Windows Kernel Security Programming” (or directly look at the ramdisk example in WDK).

The restoration card should also be considered an early application. The restoration card generally has a special partition in the hard disk for storing differential content. When writing to the protected partition, the original partition content has not changed at all. Just clearing the differential partition will have the effect of “restoration”. By the way, the restoration card requires a file system filter driver to intercept disk read and write requests and send them to the restoration card for processing, which requires the support of the operating system. Therefore, USB, network and other boot methods must be disabled in the BIOS, otherwise if the user enters an operating system without a restoration card filter driver, the restoration card will not be able to protect it.

Differential Disks for Multi-user Shared Systems

Differential disks not only serve to save states and facilitate quick recovery. For instance, when a teaching assistant needs to set up a development environment on the server for each student in the class, it’s best if the students do not interfere with each other. The methods I’ve seen include:

  • Only allowing modifications to one’s own home directory. This makes software installation troublesome.
  • Assigning a virtual machine to each person. First, a “master disk” is prepared, then copied several times. This approach is very wasteful of disk space. A Linux environment set up for development is at least several GBs. Fortunately, the schools use Linux. If it were Windows, a 2T disk might not even be able to hold the virtual machine images for one class (100 people). This method not only wastes disk space but also memory. The same standard C library is used by everyone, loaded into memory, and cannot be shared, which greatly affects the hit rate of file memory caching.
    With the “differential disk”, things become simple. Just prepare a master disk, set it to read-only, and each student creates a differential disk. The virtual machine uses this differential disk, so everyone can share the development environment on the master disk, and freely manipulate their own virtual system. However, each virtual machine operating system maintains its own memory cache, which still cannot be shared.

IC179979IC179979

In fact, the diskless system of the college computer lab and the library inquiry machine, although one is a Windows platform and the other is a Linux platform, are both implemented using differential disk technology. Since the memory of the computer lab PCs is not large, and users may write a lot of content at once, the differential disk of the diskless system, like the master disk, is also stored on the server and cleared after rebooting. The library inquiry machine has a relatively simple function, so the differential disk is directly in memory. Both the computer lab PCs and the library inquiry machines are independent physical machines, so there is naturally no consideration for sharing memory caches.

Overlayfs: Differential File System

If all users are running on a single physical machine, such as the previous teaching assistant example, we hope to share memory caches. The above “differential disk” scheme is at the block device level, and each differential block device has its own cache, which cannot be shared. If we do not use virtual machines, but use the differential of the file system within the same operating system, can we solve the problem of cache sharing?

The answer is yes. Overlayfs is a simple and elegant implementation. Overlayfs uses the Linux FUSE (user-mode file system) framework. Due to its simplicity, it is also relatively easy to port to other platforms (although I haven’t found any).

Overlayfs requires two source directories (a read-only directory and a differential directory) and a target mount directory. These directories can be in the same file system or not, it doesn’t matter. For example, the master system with the development environment installed is in /opt/base, we need to create virtual environments for three students, the “differential disks” are in /opt/diff/1, /opt/diff/2, /opt/diff/3, and the root directories each student sees are /opt/chroot/1, /opt/chroot/2, /opt/chroot/3. The following commands need to be executed:

1
2
3
mount -t overlayfs overlayfs -olowerdir=/opt/base,upperdir=/opt/diff/1 /opt/chroot/1
mount -t overlayfs overlayfs -olowerdir=/opt/base,upperdir=/opt/diff/2 /opt/chroot/2
mount -t overlayfs overlayfs -olowerdir=/opt/base,upperdir=/opt/diff/3 /opt/chroot/3

When student 1 logs in, executing “cd /opt/chroot/1; chroot .” will enter student 1’s virtual environment. If isolation of processes, networks, etc. is needed, operating system-level virtualization technologies such as LXC, OpenVZ, etc. can be used.

How does Overlayfs work?

  • When reading directory content, it reads the contents of the corresponding paths in upperdir and lowerdir, then merges them (files that exist on both sides are based on upperdir)
  • When opening a file, it saves the pointers to the file in upperdir and lowerdir for later use when reading and writing the file. If there is a “write” flag and it does not exist in upperdir, it needs to copy the corresponding file from lowerdir to upperdir. If the file is large, opening the file in “write” mode will take a long time.
  • When reading a file (note that the file is already open at this point), if the file exists in upperdir, it calls the read function of this file in upperdir; otherwise, it calls the read function of this file in lowerdir.
  • When writing to a file (note that the file is already open at this point), it calls the write function of this file in upperdir.
  • When deleting a file, if it exists in upperdir, it is directly deleted; if it exists in lowerdir, a symbolic link to a special path is created in the corresponding path in upperdir, and the extended attribute (xattr) is set. Later, when Overlayfs sees this tombstone file (symbolic link), it knows that this path has been “deleted”.
    OpenWRT (the de facto standard for open-source wireless routers) isolates the ROM and the read-write root file system using Overlayfs composed of read-only squashfs and read-write JFFS2.

CaptureCapture

There were once some folks complaining about the hassle of driver development. If a bug always causes the system to crash, you have to restart into LiveCD and delete the erroneous driver file to enter the system normally. As long as you use the Overlayfs method, maintain a well-functioning system in the lowerdir, use ramdisk as the upperdir, and perform risky operations in overlayfs. No matter what problem occurs (as long as you don’t mess up VFS), as soon as you restart, it will return to normal.

File System Snapshot: A Shortcut for Backup

If we want to make backups on non-virtualized machines, a snapshot of the file system layer is a good choice. Modern file systems such as ext4, btrfs support snapshot functions. As shown in the figure below:

  • Assume that initially, there are three directories A, B, D and three files C, E, F in the file system, forming a tree, and the superblock (tree root) points to A. Take a snapshot.
  • An operation modifies C to become C’, at this time all nodes (inodes) from the modified node to the tree root need to be separated into a new version, the tree root points to A’, A’ points to the separated B’ and the original D, B’ points to the separated C’ and the original E. Take a snapshot.
  • An operation modifies E to become E’, and a new version needs to be separated again, the tree root points to A’’, A’’ points to B’’ separated from B’ and the original D, B’’ points to the previous version of C’ and the separated E’.
    image007image007

Unlike the differential disk of the virtual machine, the snapshot of the file system generally has no impact on read and write performance, but every time a snapshot is taken, it will occupy some disk space to store the changed files and their metadata to the root path.

Virtual machine differential disks, Overlayfs, and file system snapshots are powerful time machines. Using them at the right time may solve problems that make you scratch your head. Differential storage embodies the DRY (Don’t Repeat Yourself) principle: we all know that code should not be copy & pasted, and neither should the file system!

References:

  1. http://technet.microsoft.com/en-us/library/cc720381(v=ws.10).aspx
  2. http://www.ibm.com/developerworks/cn/linux/1306_qinzl_squashfs/
  3. Taobao “Stacked File System - Overlayfs”: http://wenku.baidu.com/view/2c82473ca32d7375a41780ab.html
  4. http://wiki.openwrt.org/doc/techref/flash.layout

Comments