commonalities and differences   
===================================

Just to clarify: this post is about a Unix tool, and will not help you
finding your soul-mate :P

There is a quite useful Unix tool has gotten practically forgotten. The
tool I am talking about is `comm(1)`, and the reason it remains mostly
unused is that another tool called `diff(1)` covers some of its use
cases [1].

comm(1) compares two sorted files line by line. For instance, it can be
used to quickly check if your backup contains all the files you wanted
to put in there. Imagine you have created a backup of your gopher folder
in a tar.gz archive. Let's get the sorted list of files in the backup:

  $ tar -ztf mybackup.tar.gz | sort > backup_list.txt

and then the list of files currently contained in your gopher dir:

  $ cd $HOME/gopher; find ./ | sort > gopherdir_list.txt

I did this with my backup on republic, and I run comm(1):

  $ comm -3 backup_list.txt gopherdir_list.txt 
          ./phlog
  ./phlog/
          ./phlog/.20190227_comm.txt.swp
          ./phlog/20190227_comm.txt
          ./phlog/phlogroll
  ./phlog/phlogroll/
          ./stuff
  ./stuff/
  $

What's happening here? comm(1) is reporting on the first column the
files unique to backup_list.txt, and on the second column (separated by
a TAB) the files unique to gopherdir_list.txt. There is something off
though: comm(1) is still treating "./phlog" and "./phlog/" as distinct
entries. This is due to the different way in which find(1) and tar(1)
list directories. Easy to solve:

  $ cd $HOME/gopher; find ./  | sed 's:/$::g' |sort > gopherdir_list.txt
  $ tar -ztf mybackup.tar.gz | sed 's:/$::g' | sort > backup_list.txt

and then:

  $ comm -3 backup_list.txt gopherdir_list.txt
          ./phlog/.20190227_comm.txt.swp
          ./phlog/20190227_comm.txt
  $

This means that the second file (gopherdir_list.txt) contains two files
that are not present in the first file (backup_list.txt). Well, those
files could not be in my backup, since one of them is this post, and the
other one is the corresponding swp file created by vim(1) while I edit
it :P

comm(1) can also report the lines unique to either of the input files,
or those present in both. As always, man(1) is your best friend.

 -+-+-+-

comm(1)   appeared in UNIXv4 (1973)
find(1)   appeared in UNIXv1 (1971)
sort(1)   appeared in UNIXv1 (1971)
diff(1)   appeared in UNIXv5 (1974)
sed(1)    appeared in UNIXv7 (1979)
tar(1)    appeared in UNIXv7 (1979) and is not part of POSIX 

 -+-+-+-

[1] We will most probably talk about diff(1) in the future...