Random stuff

Things that could be useful some day.

Theodore Ts'o rant about file overwriting

Source: Launchpad

| OK, so let me explain what's going on a bit more explicitly. There are
| application programmers who are rewriting application files like this:
|
| 1.a) open and read file ~/.kde/foo/bar/baz
| 1.b) fd = open("~/.kde/foo/bar/baz", O_WRONLY|O_TRUNC|O_CREAT) --- this truncates the file
| 1.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
| 1.d) close(fd)
|
| Slightly more sophisticated application writers will do this:
|
| 2.a) open and read file ~/.kde/foo/bar/baz
| 2.b) fd = open("~/.kde/foo/bar/baz.new", O_WRONLY|O_TRUNC|O_CREAT)
| 2.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
| 2.d) close(fd)
| 2.e) rename("~/.kde/foo/bar/baz.new", "~/.kde/foo/bar/baz")
|
| What emacs (and very sophisticated, careful application writers) will do is this:
|
| 3.a) open and read file ~/.kde/foo/bar/baz
| 3.b) fd = open("~/.kde/foo/bar/baz.new", O_WRONLY|O_TRUNC|O_CREAT)
| 3.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
| 3.d) fsync(fd) --- and check the error return from the fsync
| 3.e) close(fd)
| 3.f) rename("~/.kde/foo/bar/baz", "~/.kde/foo/bar/baz~") --- this is optional
| 3.g) rename("~/.kde/foo/bar/baz.new", "~/.kde/foo/bar/baz")
|
| The fact that series (1) and (2) works at all is an accident. Ext3 in its
| default configuration happens to have the property that 5 seconds after (1)
| and (2) completes, the data is safely on disk. (3) is the ***only*** thing
| which is guaranteed not to lose data. For example, if you are using laptop
| mode, the 5 seconds is extended to 30 seconds.
|
| Now the one downside with (3) is that fsync() is a heavyweight operation.
| If your application is stupid, and has hundreds of dot files in your home
| directory, each one taking up a 4k disk block even though it is only storing
| 4 to 12 bytes of data in each singleton dot file, and you have to repeat (3)
| for each of your one hundred dot files --- and worse yet, your application
| for some stupid, unknown reason is writing all of these hundred+ dot
| files every few seconds, then (3) will be very painful. But it is painful
| because the application is stupidly written --- not for any fundamental
| filesystem fault. It's like if you had a robot which was delivering mail to
| mail box numbers 1, 2, 3, 4, 5, and crossing the street for each mail box;
| on a busy road, this is unsafe, and the robot was getting run over when it
| kept on jaywalking --- so you can tell the robot to only cross at
| crosswalks, when the "walk" light is on, which is safe, but slow --- OR, you
| could rewrite the robot's algorithsm so it delieveres the mail more
| intelligently (i.e., one side of the street, and then cross, safely at the
| crosswalk, and then do the other side of the street).
|
| Is that clear? The file system is not "truncating" files. The application is
| truncating the files, or is constantly overwriting the files using the rename
| system call. This is a fundamentally unsafe thing to do, and ext3 just happened
| to paper things over. But *both* XFS and ext4 does delayed allocation, which
| means that data blocks don't get allocated right away, and they don't get
| written right away. Btrfs will be doing delayed allocation as well; all modern
| filesystems will do this, because it's how you get better performance.
| Applications are expected to use fsync() or fdatasync(), and if that impacts
| their performance too much, to use a single berkdb or other binary database
| file, and not do something stupid with hundreds of tiny text files that only
| hold a few bytes of data in each text file.