Title: Linux NILFS file system: automatic continuous snapshots
Author: Solène
Date: 05 October 2022
Tags: linux filesystem nilfs
Description: In this article, I present the Linux file system NILFS and
its automatic continuous snapshoting system.

# Introduction

Today, I'll share about a special Linux file system that I really
enjoy.  It's called NILFS and has been imported into Linux in 2009, so
it's not really a new player, despite being stable and used in
production it never got popular.

In this file system, there is a unique system of continuous checkpoint
creation.  A checkpoint is a snapshot of your system at a given point
in time, but it can be deleted automatically if some disk space must be
reclaimed.  A checkpoint can be transformed into a snapshot that will
never be removed.

This mechanism works very well for workstations or file servers on
which redundancy is nonexistent, and on which backups are done every
day/weeks which give room for unrecoverable mistakes.
NILFS project official website
Wikipedia page about NILFS
# NILFS concepts

As NILFS is a Copy-On-Write file system (CoW), which mean when you make
a change to a file, the original chunk on the disk isn't modified but a
new chunk is created with the new content, this play well with making
an history of the files.

From my experience, it performs very well on SSD devices on a desktop
system, even during heavy I/O operation.

The continuous checkpoint creation system may be very confusing, so
I'll explain how to learn about this mechanism and how to tame it.

# Garbage collection

The concept of a garbage collector may appear given for most people,
but if it doesn't speak to you, let me give a quick explanation.  In
computer science, a garbage collector is a task that will look at
unused memory and make it available again.

On NILFS, as a checkpoint is created every few seconds, used data is
never freed and one would run out of disk pretty quickly.  But here is
the `nilfs_cleanerd` program, the garbage collector, that will look at
the oldest checkpoint and delete them to reclaim the disk space under
certain condition.  Its default strategy is trying to keep checkpoints
as long as possible, until it needs to make some room to avoid issues,
it may not suit a workload creating a lot of files and that's why it
can be tuned very precisely.  For most desktop users, the defaults
should work fine.

The garbage collector is automatically started on a volume upon mount. 
You can use the command `nilfs-clean` to control that daemon, reload
its configuration, stop it etc...

When you delete a file on a NILFS file system, it doesn't free up any
disk space because it's still available in a previous checkpoint, you
need to wait for the according checkpoints to be removed to have some
space freed.

# How to find the current size of your data set

As the output of `df` for a NILFS filesystem will give you the real
data used on the disk for your data AND the snapshots/checkpoints, it
can't be used to know how much free disk is available/used.

In order to figure the current disk usage (without accounting older
checkpoints/snapshots), we will use the command lscp to look at the
number of blocks contained in the most recent checkpoint.  On Linux, a
block is 4096 bytes, we can then turn the total in bytes into gigabytes
by dividing three time by 1024 (bytes -> kilobytes -> megabytes ->
gigabytes).

```shell
lscp | awk 'END { print $(NF-1)*4096/1024/1024/1024 }'
```

This number is the current size of what you have on the partition.

# Create a checkpoint / snapshot

It's possible to create a snapshot of your current system state using
the command `mkcp`.

```
mkcp --snapshot
```

Or you can turn a checkpoint into a snapshot using the command chcp.

```
chcp ss /dev/sda1 28579
```

The opposite operation (snapshot to checkpoint) can be done using `chcp
cp`.

# How to recover files after a big mistake

Let's say you deleted an important in-progress work, you don't have any
backup and no way to retrieve it, fortunately you are using NILFS and a
checkpoint was created every few seconds, so the files are still there
and at reach!

The first step is to pause the garbage collector to avoid losing the
files: `nilfs-clean --suspend`.  After this, we can think slowly about
the next steps without having to worry.

The next step is to list the checkpoints using the command `lscp` and
look at the date/time in which the files still existed and preferably
in their latest version, so the best is to get just before the
deletion.

Then, we can mount the checkpoint (let's say number 12345 for the
example) on a different directory using the following command:

```shell
mount -t nilfs2 -r -o cp=12345 /dev/sda1 /mnt
```

If it went fine, you should be able to browse the data in `/mnt` to
recover your files.

Once you finished recovering your files, umount `/mnt` and resume the
garbage collector with `nilfs-clean --resume`.

# Going further

Here is a list of extra pieces you may want to read to learn more about
nilfs2:

* nilfs_cleanerd and nilfs_cleanerd.conf man pages to tune the garbage
collector
* man pages for lscp / mkcp / rmcp / chcp to manage snapshots and
checkpoints manually