This is a text-only version of the following page on https://raymii.org:
---
Title       : 	Generate hashes of files with rhash for archival storage
Author      : 	Remy van Elst
Date        : 	11-12-2015
URL         : 	https://raymii.org/s/blog/Generate_hashes_of_files_with_rhash_for_archival_storage.html
Format      : 	Markdown/HTML
---



Recently I had to archive a large amount of files to archival storage. To save
space and reduce the amount of files I decided to create archives with tar. The
files will be stored to tapes and DVD's, and will be restored in full, so random
access times are not an issue, therefore the tar.gz choice.

I do want to make sure that when the files need to be restored they still are
correct. I first dabbled with some long shell commands to create checksums and
verify them, but then I found the `rhash` tool in the repositories. It allows
you to create checksums of files and folders, recursively, with all sorts of
checksums, like CRC, MD5, SHA1 and many more. It also makes bulk validation very
simple.

This small article shows you how to create an archive file with the checksums
included and shows you how to validate these checksums later on.

<p class="ad"> <b>Recently I removed all Google Ads from this site due to their invasive tracking, as well as Google Analytics. Please, if you found this content useful, consider a small donation using any of the options below:</b><br><br> <a href="https://leafnode.nl">I'm developing an open source monitoring app called  Leaf Node Monitoring, for windows, linux & android. Go check it out!</a><br><br> <a href="https://github.com/sponsors/RaymiiOrg/">Consider sponsoring me on Github. It means the world to me if you show your appreciation and you'll help pay the server costs.</a><br><br> <a href="https://www.digitalocean.com/?refcode=7435ae6b8212">You can also sponsor me by getting a Digital Ocean VPS. With this referral link you'll get $100 credit for 60 days. </a><br><br> </p>


The data in question are archived tapes, disk copies, source code and
documentation for the PDP8 mainframe. We also have these for the PDP11 and a few
VAX machines. The archives contain about 5 million files and is about 700 GB in
size. The company decided to phase out the on-line storage and place this data
on tapes and dvd's, since they're not accessed more than once or twice a month.

### Creating the hashes

The first archive contains PDP8 files located in the folder `pdp8`. This command
creates the `MD5SUMS` file, which we place in the same folder:

    
    
    rhash --recursive --md5 --output=pdp8/MD5SUMS pdp8/
    

The archive is later on created with a simple `tar -czf pdp8.tar.gz pdp8`.

### Verifying the hashes

Extract the archive to a folder and use the following command to verify all
files:

    
    
    rhash --skip-ok --check pdp8/MD5SUMS 
    

If all files match the output looks like this:

    
    
    --( Verifying pdp8/MD5SUMS )----------------------------------------------------
    --------------------------------------------------------------------------------
    Everything OK
    

If a file does not match the hash, the output will include it:

    
    
    --( Verifying pdp8/MD5SUMS )----------------------------------------------------
    pdp8/pdp8/readme.txt                                ERR
    --------------------------------------------------------------------------------
    Errors Occurred: Errors:1   Miss:0   Success:3323 Total:3324
    

If you leave out the `--skip-ok` option all files checked will be shown which
might result in long output.

To manually verify one file, first get the checksum:

    
    
    grep 'pdp8/readme.txt' pdp8/MD5SUMS 
    53a1aca1631d55de3feece9e1c4d900a  pdp8/pdp8/readme.txt
    

Then manually execute the correct checksum command to verify the match:

    
    
    $ md5sum pdp8/pdp8/readme.txt 
    53a1aca1631d55de3feece9e1c4d900a  pdp8/pdp8/readme.txt
    

   [1]: https://www.digitalocean.com/?refcode=7435ae6b8212

---

License:
All the text on this website is free as in freedom unless stated otherwise. 
This means you can use it in any way you want, you can copy it, change it 
the way you like and republish it, as long as you release the (modified) 
content under the same license to give others the same freedoms you've got 
and place my name and a link to this site with the article as source.

This site uses Google Analytics for statistics and Google Adwords for 
advertisements. You are tracked and Google knows everything about you. 
Use an adblocker like ublock-origin if you don't want it.

All the code on this website is licensed under the GNU GPL v3 license 
unless already licensed under a license which does not allows this form 
of licensing or if another license is stated on that page / in that software:

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

Just to be clear, the information on this website is for meant for educational 
purposes and you use it at your own risk. I do not take responsibility if you 
screw something up. Use common sense, do not 'rm -rf /' as root for example. 
If you have any questions then do not hesitate to contact me.

See https://raymii.org/s/static/About.html for details.