Migrating to Git’s Large File Storage (LFS)

Who is This For?

I work in Git a lot, including for this blog. This blog has lots of images, which take up a lot of space and push some of my Git clients (particularly on iOS) to their limits. Git was not intended to store large binary blobs of data, so Git LFS can be used to slim down my repository, only download those resources it needs, and keeping the ‘version control’ side of the large files much lighter.

Before We Start

This is intended to irreparably alter your entire git repository, requiring a force push to your master branch. Everyone who uses your repository will have no choice but to clone (download) it again, from scratch. They will lose their work if it’s not committed in the repository you’re working on locally with the commands below. If you are not comfortable with that, do not follow the steps below. Also, though these commands worked for me and my situation, and I earnestly hope they’re useful to you too, I provide no warranty, guarantee or support.

Setup

# Get set up - replace the 'brew' command with something else
# if you're not on Mac OS X. You need to have homebrew installed
# first for this to work.
$ brew install git-lfs
$ git lfs install

# These next two commands are for if you'd like to push to a
# test repository first
$ git git remote rename origin old-origin
$ git remote add origin ssh://git@gitlab-host.com/user/repo-name.git

# The main event
$ git lfs migrate import --everything --include="*.jpg,*.jpeg,*.mp4,*.mov,*.png"

The --everything flag ensures all necessary branches and commits are rewritten. If you’re wanting to try this out on a specific branch first, you might like to use --include-ref=name-of-branch. For more options, $ git lfs migrate has a man page that you should read through.

Double-checking

It’s useful to do a sense-check, in case you’ve missed some files. There are two ways to start. If you find additional files, you can go back and re-run the $ git lfs migrate import command, fixing it up to incorporate what you’ve missed.

All files in LFS

You can use LFS’s ls-files option to get started on the objects stored under LFS:

$ git lfs ls-files

Files not in LFS

From https://stackoverflow.com/questions/42963854/list-files-not-tracked-by-git-lfs, it is possible to see the files not in LFS — I found this most helpful as I’d momentarily forgotten about the existence of my PNG-formatted screenshots.

$ { git ls-files && git lfs ls-files | cut -d' ' -f3-; } | sort | uniq -u

Biggest files in your git commits

This does not take into account what’s in LFS and what’s not. However, I found it useful as part of my sense-checking, as I have a lot of large, dispersed files. To find your five largest files: $ git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -5

To view information take the commit hash of the last (the largest) and run: $ git rev-list --objects --all | grep <hash>. Repeat working your way up the list until you’re satisfied. You can always use git lfs ls-files and grep to check for LFS coverage of specific files.

Cleaning up commits

Deleting files

Git LFS allows git checkouts to be ‘lazy’ — that is, for resources to be downloaded only as they are required. So if you remove some files by executing $ git rm unwanted.jpg, it won’t inflate the size of $ git checkout anymore, unless someone goes back to a prior commit. However, as we are re-writing the commit history anyway, I did use the opportunity to obliterate some files that were annoying me, so I wouldn’t blame you if you felt the same compunction. Here’s how I achieved a cleaner repository. In constructing this section, I found this StackOverflow answer helpful.

Determine the commits where undesireable files were once committed by running $ git log <filename>. The earliest commit shown is the first commit of that file. More information can be shown with $ git show <hash>

I had commits of undesired files in my very first commit, so in my case I had to go back to the first commit with $ git rebase -i --root. This will list your entire commit history. If you only have to go back a few commits, do so. This will save you time and make finding the commits in the list more straightforward.

When your text editor is launched, change ‘pick’ to ‘edit’ on the commits where undesired files were once committed. Then, as git steps you through each commit, remove the corresponding files with:

$ git rm name-of-file
$ git commit --amend  '-S'
$ git rebase --continue

You will do this for each entry you change to ‘commit’ in your text editor.

Re-signing Commits

If you don’t have files to remove, you may have skipped the rebasing step above. If your commits are cryptographically signed, you will notice that their signatures are now invalid. You can quickly fix ths up by running

$ git rebase -i root

You can leave every option as ‘pick’. Continue, and git should automatically re-sign each commit. You can verify the signatures are correct with $ git log --show-signature.

Ready to Commit

You will need to make sure that you can force push to your master branch. For example in GitLab, you will need to unprotect the master branch in the GitLab repository settings page. Then, you may force push your updated repository.

$ git remote remove origin
$ git remote rename origin-old origin
$ git push --set-upstream origin master --force