Title: My BTRFS cheatsheet
Author: Solène
Date: 29 August 2022
Tags: btrfs linux
Description: 

# Introduction

I recently switched my home "NAS" (single disk!) to BTRFS, it's a
different ecosystem with many features and commands, so I had to write
a bit about it to remember the various possibilities...

BTRFS is an advanced file-system supported in Linux, it's somehow
comparable to ZFS.

# Layout

A BTRFS file-system can be made of multiple disks and aggregated in
mirror or "concatenated", it can be split into subvolumes which may
have specific settings.

Snapshots and quotas are applying on subvolumes, so it's important to
think beforehand when creating BTRFS subvolumes, one may want to use a
subvolume for /home and for /var for most cases.

# Snapshots / Clones

It's possible to take an instant snapshot of a subvolume, this can be
used as a backup.  Snapshots can be browsed like any other directory. 
They exist in two flavors: read-only and writable.  ZFS users will
recognize writable snapshots as "clones" and read-only as regular ZFS
snapshots.

Snapshots are an effective way to make a backup and rolling back
changes in a second.

# Send / Receive

Raw filesystem can be sent / receive over network (or anything
supporting a pipe) to allow incremental differences backup.  This is a
very effective way to do incremental backups without having to scan the
entire file-system each time you run your backup.

# Deduplication

I covered deduplication with bees, but one can also use the program
"duperemove" (works on XFS too!).  They work a bit differently, but in
the end they have the same purpose. Bees operates on the whole BTRFS
file-system, duperemove operates on files, it's different use cases.
duperemove GitHub project page
Bees GitHub project page
# Compression

BTRFS supports on-the-fly compression per subvolume, meaning the
content of each file is stored compressed, and decompressed on demand. 
Depending on the files, this can result in better performance because
you would store less content on the disk, and it's less likely to be
I/O bound, but also improve storage efficiency.  This is really content
dependent, you can't compress binary files like pictures/videos/music,
but if you have a lot of text and sources files, you can achieve great
ratios.

From my experience, compression is always helpful for a regular user
workload, and newer algorithm are smart enough to not compress binary
data that wouldn't yield any benefit.

There is a program named compsize that reports compression statistics
for a file/directory.  It's very handy to know if the compression is
beneficial and to which extent.
compsize GitHub project page
# Defragmentation

Fragmentation is a real thing and not specific to Windows, it matters a
lot for mechanical hard drive but not really for SSDs.

Fragmentation happens when you create files on your file-system, and
delete them: this happens very often due to cache directories, updates
and regular operations on a live file-system.

When you delete a file, this creates a "hole" of free space, after some
time, you may want to gather all these small parts of free space to
have big chunks of free space, this matters for mechanical disks has
the physical location of data is tied to the raw performance.  The
defragmentation process is just physically reorganizing data to order
files chunks and free space into continuous blocks.

Defragmentation can be used to force compression in a subvolume, like
if you want to change the compression algorithm or enabled compression
after saving the files.

The command line is: btrfs filesystem defragment

# Scrubbing

The scrubbing feature is one of the most valuable feature provided by
BTRFS and ZFS.  Each file in these file-system is associated with its
checksum in some metadata index, this mean you can actually check each
file integrity by comparing its current content with the checksum known
in the index.

Scrubbing costs a lot of I/O and CPU because you need to compute the
checksum of each file, but it's a guarantee for validating the stored
data.  In case of a corrupted file, if the file-system is composed of
multiple disks (raid1 / raid5), it can be repaired from mirrored
copies, it should work most of the time because such file corruption is
often related to the drive itself, thus other drives shouldn't be
affected.

Scrubbing can be started / paused / resumed, this is handy if you need
to operate heavy I/O and you don't want the scrubbing process to
increase time.  While the scrub commands can take a device or a path,
the path parameter is only used to find the related file-system, it
won't just scrub the files in that directory.

The command line is: btrfs scrub

# Rebalancing

When you are aggregating multiple disks into one BTRFS file-system,
files are written on a disk and some other files are written to the
other, after a while, a disk may contain more data than the other.

The rebalancing purpose is to redistribute data across the disks more
evenly.

# Swap file

You can't create a swap file on a BTRFS disk without a tweak.  You must
create the file in a directory with the special attribute "no COW"
using "chattr +C /tmp/some_directory", then you can move it anywhere as
it will inherit the "no COW" flag.

If you try to use a swap file with COW enabled on it, swapon will
report a weird error, but you get more details in the dmesg output.

# Converting

It's possible to convert a ext2/3/4 file-system into BTRFS, obviously
it must not be currently in use.  The process can be rolled back until
a certain point like defragmenting or rebalancing.