fc fs筆記

My goal for the fsync tree log was to make it just do the right thing most of the time. We mostly got there, thanks to a ton of fixes and test cases from Filipe.

fsync(some file) -- all the names for this file will exist, without having to fsync the directory.

fsync(some dir) -- ugh, don't fsync the directory. But if you do, all the files/subdirs will exist, and unlinks will be done and renames will be included. This is slow and may require a full FS commit, which is why we don't want dirs fsunk.

95 viewsedited 07:21

fc fs筆記

https://www.spinics.net/lists/linux-btrfs/msg77340.html

What ext4 does is this:

fsync(some file) -- for a newly created file, the filename that it was created under will exist. If the file has a hard-link added, the hard link is not guarnateed to be written to disk

fsync(some dir) -- all changes to file names in thentee directory will exist after the crash. It does *not* guarantee that any data changes to any of files in the directories will persist after a crash.

It seems to me that it would be desirable if all of the major file systems have roughly the same minimum guarantee for fsync(2), so that application writers don't have to make file-system specific assumptions. In general the goal ought to be "the right thing" should happen.

The reason why ext4 doesn't sync all possible hard link names is that (a) that's not a common requiremnt for most applications, and (b) it's too hard to find all of the directories which might contain a hard link to a particular file. But otherwise, the semantics seem to largely match up with what Chris as suggested for btrfs.

www.spinics.net

Re: Inconsistent behavior of fsync in btrfs — Linux BTRFS

Linux BTRFS Storage: Re: Inconsistent behavior of fsync in btrfs

121 views07:28

fc fs筆記

https://lore.kernel.org/lkml/2911ac5cd20b46e397be506268718d74@paragon-software.com/ NTFS Read-Write driver from Paragon

94 views22:50

fc fs筆記

From: Zygo Blaxell @ 2020-08-28 4:36 UTC (permalink / raw) To: Eric Wong; +Cc: kreijack, linux-btrfs

> > Note that add/remove is orders of magnitude slower than replace. Replace might take hours or even a day or two on a huge spinning drive. Add/remove might take months, though if you have 8-year-old disks then it's probably a few days, weeks at most.

> Btw, any explanation or profiling done on why remove is so much slower than replace? Especially since btrfs raid1 ought to be fairly mature at this point (and I run recent stable kernels).

They do different things.

Replace just computes the contents of the filesystem the same way scrub does: except for the occasional metadata seek, it runs at wire speeds because it reads blocks in order from one disk and writes in order on the other disk, 99.999% of the time.

Remove makes a copy of every extent, updates every reference to the extent, then deletes the original extents. Very seek-heavy --including seeks between reads and writes on the same drive--and the work is roughly proportional to the number of reflinks, so dedupe and snapshots push the cost up. About the only advantage of remove (and balance) is that it consists of 95% existing btrfs read and write code, and it can handle any relocation that does not require changing the size or content of an extent (including all possible conversions).

Arguably this isn't necessary. Remove could copy a complete block group, the same way replace does but to a different offset on each drive, and simply update the chunk tree with the new location of the block group at the end. Trouble is, nobody's implemented this approach in btrfs yet. It would be a whole new code path with its very own new bugs to fix.

https://lore.kernel.org/linux-btrfs/20200829184610.GW5890@hungrycats.org/T/#m3d45dd2d29692650a7b76e13e1819edf87455e05

86 viewsedited 01:30

fc fs筆記

From: Zygo Blaxell @ 2020-08-28 20:56 UTC (permalink / raw) To: Andrei Borzenkov; +Cc: Eric Wong, kreijack, linux-btrfs

> > Replace just computes the contents of the filesystem the same way scrub does: except for the occasional metadata seek, it runs at wire speeds because it reads blocks in order from one disk and writes in order on the other disk, 99.999% of the time.

> Does it write them to the same absolute disk locations? IOW - is it possible to use smaller disk for replace or it must be at least as large as original disk?

Replace writes data to the locations recorded in the chunk tree, i.e. the original disk locations on the missing disk.

In theory, you can resize the offline disk to be smaller than the replacement disk, then run btrfs replace. In practice, only some of the methods work (e.g. you must specify device ID and not device name when replacing) and only on recent kernel versions.

btrfs dev remove is equivalent to 'btrfs fi resize :0' followed by "remove empty device " so the performance will be very similar for the portion of the data that is resized; however, a combination of resize and replace is still much faster than device remove, which does it the slow way for all of the data.

https://lore.kernel.org/linux-btrfs/20200829184610.GW5890@hungrycats.org/T/#m1f2cdf6f67b4361329d98699d1a163c69f9e7ac1

100 views01:32

fc fs筆記

https://lwn.net/Articles/789623/ The Linux "copy problem"

lwn.net

The Linux "copy problem"

In a filesystem session on the third day of the 2019 Linux Storage,
Filesystem, and Memory-Management Summit (LSFMM), Steve French wanted to
talk about copy operations. Much of the development work that has gone on
in the Linux filesystem world over the…

116 views05:27

fc fs筆記

FFS.pdf

https://people.eecs.berkeley.edu/~brewer/cs262/FFS.pdf A Fast File System for UNIX

108 views07:39

fc fs筆記

10.1.1.55.2970.pdf

62.6 KB

Extent−like Performance from a UNIX File System

93 views07:51

fc fs筆記

10.1.1.52.6474.pdf

122.8 KB

An Implementation of a LogStructured File System for UNIX

94 views07:51

fc fs筆記

Forwarded from farseerfc 😂

還沒實現的計劃… btrfs 實現內置加密的難點在於 per file 或者 per subvol 的加密不能和 reflink 正交，btrfs 只有一棵 extent tree ，按現在的設計要加密就只能整個 pool 加密，不能只對某些 subvol 單獨加密。整 pool 加密的話相對 dm-crypt 的現有方案沒有什麼優勢

111 views00:20

fc fs筆記

bit.ly/whyswap 關於 Linux swap 的誤解和澄清

chrisdown.name

In defence of swap: common misconceptions

tl;dr: Having swap is a reasonably important part of a well functioning system. Without it, sane memory management becomes harder to achieve.

106 views07:19

About

Blog

Apps

Platform