proxy70

# Git for Multimedia
by Seth Kenlon

Git is very specifically designed for source code, meaning that it's not
generally embraced by projects and industries that don't tend to work in
plain text. The problem is that Git, as part of its very purpose for
existing, remembers nearly everything. That's the deal, explicitly: when
you work in Git, you don't lose data history.

However, the advantages of an asynchronous workflow are appealing no
matter what industry you're in. Git also happens to be implicit in the
ever-growing number of industries that manage to combine serious
computing with seriously artistic ventures, whether it's web design,
visual effects, video games, publishing, currency design (yes, that's a
real industry), education; the list goes on and on. The advantages are
so persuasive that many in these industries do use Git, because there's
software out there to make the combination of Git and multimedia
possible.

The problem
===========

It seems to be common knowledge that Git doesn't do well with non-text
files, but it never hurts to challenge assumptions. Here's an example of
working with photos with Git:

$ du -hs
108K .
$ cp ~/photos/dandelion.tif .
$ git add dandelion.tif
$ git commit -m 'added a photo'
[master (root-commit) fa6caa7] two photos
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 dandelion.tif
$ du -hs
1.8M .

Nothing unusual so far; adding a 1.8 MB photo to a directory results in
a directory 1.8 MB in size.

$ git rm dandelion.tif
$ git commit -m 'deleted a photo'
$ du -hs
828K .

It seems that removing large files once they've been committed increases
a repository's size roughly 8 times its original barren state. You can
perform tests to get a better average than one simple demonstration, but
this does reflect my experience in the past; initially, the cost of
committing files that aren't text-based is minimal, but the longer a
project stays active, the more changes people make to static content,
and those fractions start to add up. When a Git repository becomes very
large, the consequence people usually complain about is speed. The time
to perform pulls and pushes goes from being the time it takes to take a
sip of coffee to the time it takes to wonder if your computer somehow
got kicked off the network.

The reason static content causes Git to grow in size is that formats
based on text allow Git to pull out just the parts that have changed.
Raster images and music files make as much sense to Git as they would to
you, if you were to look at the binary data contained in a `.png` or
`.wav` file, so Git just takes all of the data and makes a new copy of
it, even if only one pixel has changed from one photo to the next.

Git-portal
==========

In practice, many projects that deal with any kind of media don't
actually need or want to track the history of the media. The media part
of a project tends to have a different life cycle than the text or code
part of a project. Media assets generally progress in one direction; a
picture that starts out as a pencil sketch proceeds toward its
destination as a digital painting, and even if the text gets rolled back
to an earlier version, the art is expected to continue its forward
progress. It's rare for media to actually be bound to a specific version
of a project. There are exceptions, but in my experience those
exceptions are more often with graphics that reflect data sets, and
those are usually tables or graphs or charts, which can be done in
text-based formats such as SVG.

So on many projects that involve both media and text, whether it's
narrative prose or code, Git is an acceptable solution as long as
there's a playground outside of the version control cycle for artists to
play in.

![](git-velocity.png)

A simple way of enabling that is
[Git-portal](http://gitlab.com/slackermedia/git-portal.git), a Bash
script armed with Git hooks that moves your asset files to a directory
outside of Git's purview, and replaces them with symlinks. Git commits
the symlinks, which are trivially small, so all you commit are your text
files and whatever symlinks represent your media assets. Because the
replacement files are symlinks (sometimes called an `alias` or
`shortcut`), your project continues to function as expected because your
local machine follows the symlinks to their "real" counterparts.
Git-portal maintains the directory structure of your project when it
swaps out a file with a symlink, so it's easy to reverse the process,
should you either decide that Git-portal isn't right for your project,
or you need to build a version of your project without symlinks (for
distribution, for instance).

In addition, Git-portal allows for remote synchronization of assets over
`rsync`, so you can set up a remote storage location as a centralized
source of authority.

Git-portal is ideal for multimedia projects, from video games, tabletop
design, VR projects with big 3D model renders and textures,
[books](https://www.apress.com/gp/book/9781484241691) with graphics and
`.odt` exports, collaborative [blog websites](http://mixedsignals.ml),
music projects, and much more. It's not uncommon for any versioning
required by artists to be performed in their application, in the form of
layers (in the graphic world) and tracks (in the music world), so Git
adds nothing to multimedia project files themselves. The power of Git is
leveraged for other parts of artistic projects (prose and narrative,
project management, subtitle files, credits, marketing copy,
documentation, and so on), and the power of structured remote backups is
leveraged by the artists.

Installing Git-portal
---------------------

Git-portal is currently a manual install from its home on Gitlab. It's
just a Bash script and some Git hooks (which are themselves, in this
case, Bash scripts), but it requires a quick build process nevertheless
so that it knows where to install itself.

$ git clone https://gitlab.com/slackermedia/git-portal.git git-portal.clone
$ cd git-portal.clone
$ ./configure
$ make
$ sudo make install

Using Git-portal
----------------

Git-portal is used alongside Git. This means there are some added steps
to remember, but you only need Git-portal when dealing with your media
assets, so it's pretty easy to remember unless you've managed to
acclimate yourself to treating large files the same text files (which is
rare for Git users). There's also one setup step that you must do when
deciding to use Git-portal in a project.

$ mkdir bigproject.git
$ cd !$
$ git init
$ git-portal init

The `init` function of Git-portal creates a `_portal` directory in your
Git repository, and adds it to your `.gitignore` file.

Using Git-portal in a daily routine integrates smoothly with Git itself.
A good example is a mostly MIDI-based music project: the project files
produced by the music workstation are text-based, but the MIDI files are
binary data.

$ ls -1
_portal
song.1.qtr
song.qtr
song-Track_1-1.mid
song-Track_1-3.mid
song-Track_2-1.mid
$ git add song*qtr
$ git-portal song-Track*mid
$ git add song-Track*mid

If you look into the `_portal` directory, you'll find the original MIDI
files. The files in their place are symlinks to `_portal`, which keeps
the music workstation working as expected.

$ ls -lG
[...] _portal/
[...] song.1.qtr
[...] song.qtr
[...] song-Track_1-1.mid -> _portal/song-Track_1-1.mid*
[...] song-Track_1-3.mid -> _portal/song-Track_1-3.mid*
[...] song-Track_2-1.mid -> _portal/song-Track_2-1.mid*

As with Git, you can also add a directory of files:

$ cp -r ~/synth-presets/yoshimi .
$ git-portal add yoshimi
Directories cannot go through the portal. Sending files instead.
$ ls -lG _portal/yoshimi
[...] yoshimi.stat -> ../_portal/yoshimi/yoshimi.stat*

Removal works as expected, but when removing something in `_portal`, you
should use the `git-portal rm` instead of `git rm`. Using Git-portal
ensures that the file is removed from `_portal`.

$ ls
_portal/ song.qtr song-Track_1-3.mid@ yoshimi/
song.1.qtr song-Track_1-1.mid@ song-Track_2-1.mid@
$ git-portal rm song-Track_1-3.mid
rm 'song-Track_1-3.mid'
$ ls _portal/
song-Track_1-1.mid* song-Track_2-1.mid* yoshimi/

If you forget to use Git-portal, then you must remove the portal file
manually.

$ git-portal rm song-Track_1-1.mid
rm 'song-Track_1-1.mid'
$ ls _portal/
song-Track_1-1.mid* song-Track_2-1.mid* yoshimi/
$ trash _portal/song-Track_1-1.mid

Git-portal's only other function is to list all current symlinks and to
find any that may have become broken, which can sometimes happen if
files move around in a project directory.

$ mkdir foo
$ mv yoshimi foo
$ git-portal status
bigproject.git/song-Track_2-1.mid: symbolic link to _portal/song-Track_2-1.mid
bigproject.git/foo/yoshimi/yoshimi.stat: broken symbolic link to ../_portal/yoshimi/yoshimi.stat

If you're using Git-portal for a personal project and you're maintaining
your own backups, then that's technically all you need to know about
Git-portal. If you want to add in collaborators, though, or you want
Git-portal to manage backups for you the way (more or less) Git does,
then you can a remote.

Git-portal remotes
------------------

Adding a remote location for Git-portal is done through Git's existing
remote function. Git-portal implements Git hooks, scripts hidden away in
your repository's `.git` directory, to look at your remotes for any
remote with a name beginning with `_portal`. If it finds one, then it
attempts to `rsync` to the remote location and synchronize files.
Git-portal performs this action any time you do a Git push or a Git
merge (or pull, which is really just a fetch and an automatic merge).

If you've only cloned Git repositories, then you may never have added a
remote yourself. It's a standard Git procedure:

$ git remote add origin git@gitdawg.com:seth/bigproject.git
$ git remote -v
origin git@gitdawg.com:seth/bigproject.git (fetch)
origin git@gitdawg.com:seth/bigproject.git (push)

The name `origin` is a popular convention for your main Git repository,
so it makes sense to use it for your Git data. Your Git-portal data,
however, gets stored separately, so you must create a second remote to
tell Git-portal where to push and pull from. Depending on your Git host,
you may need a separate server entirely, because gigabytes of media
assets aren't likely to be accepted by a Git host with limited space, or
maybe you're on a server that only permits you to access your Git
repository and not external storage directories.

$ git remote add _portal seth@example.com:/home/seth/git/bigproject_portal
$ git remote -v
origin git@gitdawg.com:seth/bigproject.git (fetch)
origin git@gitdawg.com:seth/bigproject.git (push)
_portal seth@example.com:/home/seth/git/bigproject_portal (fetch)
_portal seth@example.com:/home/seth/git/bigproject_portal (push)

You may not want to give all your users individual accounts on your
server, and you don't have to. To provide access to the server hosting a
repository's large file assets, you can run a Git front-end like
[Gitolite](LINK TO MY SHARING WITH GIT ARTICLE), or you can use `rrsync`
(restricted `rsync`).

Now you can push your Git data to your remote Git repository and your
Git-portal data to your remote portal:

$ git push origin HEAD
master destination detected
Syncing _portal content...
sending incremental file list
sent 9,305 bytes received 18 bytes 1,695.09 bytes/sec
total size is 60,358,015 speedup is 6,474.10
Syncing _portal content to example.com:/home/seth/git/bigproject_portal

Any user with Git-portal installed and with a `_portal` remote
configured has their `_portal` directory synchronized, getting new
content from the server and sending fresh content with every push. While
it's not required to do a Git commit and push to sync with the server (a
user could just use rsync directly), I find it useful to require commits
for artistic changes. It integrates artists and their digital assets
into the rest of the workflow, and it provides useful metadata about
project progress and velocity.

Other options
=============

There are other options for managing large files with Git, and if
Git-portal is too simple for you, then they're worth looking into. [Git
LFS](https://git-lfs.github.com/) is a fork of a defunct project called
Git-media and is maintained and supported by Github. It also requires
special commands (like `git lfs track` to protect large files from being
tracked by Git), and requires the user to manage a `.gitattributes` file
to update what files in the repository are tracked by LFS. It supports
*only* HTTP and HTTPS remotes for its large files, however, so your LFS
server must be configured so that users can authenticate over HTTP
rather than SSH or rsync.

A third option is Git-annex, which I explain in detail in my article
about [managing binary blobs in
Git](https://opensource.com/life/16/8/how-manage-binary-blobs-git-part-7)
(ignore the parts about git-media, which is deprecated and its former
flexibility doesn't apply to its successor, Git LFS). Git-annex is a
flexible and elegant solution with a detailed system for adding,
removing, and moving large files within a repository. Because it's
flexible and powerful, there are lots of new commands and rules to
learn, so take a look at its
[documentation](https://git-annex.branchable.com/walkthrough/).

If, however, your needs are simple and you like a solution that utilizes
existing technology to do simple and obvious tasks, Git-portal might be
the tool for the job. Have fun!