# Git for Multimedia by Seth Kenlon Git is very specifically designed for source code, meaning that it's not generally embraced by projects and industries that don't tend to work in plain text. The problem is that Git, as part of its very purpose for existing, remembers nearly everything. That's the deal, explicitly: when you work in Git, you don't lose data history. However, the advantages of an asynchronous workflow are appealing no matter what industry you're in. Git also happens to be implicit in the ever-growing number of industries that manage to combine serious computing with seriously artistic ventures, whether it's web design, visual effects, video games, publishing, currency design (yes, that's a real industry), education; the list goes on and on. The advantages are so persuasive that many in these industries do use Git, because there's software out there to make the combination of Git and multimedia possible. The problem =========== It seems to be common knowledge that Git doesn't do well with non-text files, but it never hurts to challenge assumptions. Here's an example of working with photos with Git: $ du -hs 108K . $ cp ~/photos/dandelion.tif . $ git add dandelion.tif $ git commit -m 'added a photo' [master (root-commit) fa6caa7] two photos 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 dandelion.tif $ du -hs 1.8M . Nothing unusual so far; adding a 1.8 MB photo to a directory results in a directory 1.8 MB in size. $ git rm dandelion.tif $ git commit -m 'deleted a photo' $ du -hs 828K . It seems that removing large files once they've been committed increases a repository's size roughly 8 times its original barren state. You can perform tests to get a better average than one simple demonstration, but this does reflect my experience in the past; initially, the cost of committing files that aren't text-based is minimal, but the longer a project stays active, the more changes people make to static content, and those fractions start to add up. When a Git repository becomes very large, the consequence people usually complain about is speed. The time to perform pulls and pushes goes from being the time it takes to take a sip of coffee to the time it takes to wonder if your computer somehow got kicked off the network. The reason static content causes Git to grow in size is that formats based on text allow Git to pull out just the parts that have changed. Raster images and music files make as much sense to Git as they would to you, if you were to look at the binary data contained in a `.png` or `.wav` file, so Git just takes all of the data and makes a new copy of it, even if only one pixel has changed from one photo to the next. Git-portal ========== In practice, many projects that deal with any kind of media don't actually need or want to track the history of the media. The media part of a project tends to have a different life cycle than the text or code part of a project. Media assets generally progress in one direction; a picture that starts out as a pencil sketch proceeds toward its destination as a digital painting, and even if the text gets rolled back to an earlier version, the art is expected to continue its forward progress. It's rare for media to actually be bound to a specific version of a project. There are exceptions, but in my experience those exceptions are more often with graphics that reflect data sets, and those are usually tables or graphs or charts, which can be done in text-based formats such as SVG. So on many projects that involve both media and text, whether it's narrative prose or code, Git is an acceptable solution as long as there's a playground outside of the version control cycle for artists to play in. ![](git-velocity.png) A simple way of enabling that is [Git-portal](http://gitlab.com/slackermedia/git-portal.git), a Bash script armed with Git hooks that moves your asset files to a directory outside of Git's purview, and replaces them with symlinks. Git commits the symlinks, which are trivially small, so all you commit are your text files and whatever symlinks represent your media assets. Because the replacement files are symlinks (sometimes called an `alias` or `shortcut`), your project continues to function as expected because your local machine follows the symlinks to their "real" counterparts. Git-portal maintains the directory structure of your project when it swaps out a file with a symlink, so it's easy to reverse the process, should you either decide that Git-portal isn't right for your project, or you need to build a version of your project without symlinks (for distribution, for instance). In addition, Git-portal allows for remote synchronization of assets over `rsync`, so you can set up a remote storage location as a centralized source of authority. Git-portal is ideal for multimedia projects, from video games, tabletop design, VR projects with big 3D model renders and textures, [books](https://www.apress.com/gp/book/9781484241691) with graphics and `.odt` exports, collaborative [blog websites](http://mixedsignals.ml), music projects, and much more. It's not uncommon for any versioning required by artists to be performed in their application, in the form of layers (in the graphic world) and tracks (in the music world), so Git adds nothing to multimedia project files themselves. The power of Git is leveraged for other parts of artistic projects (prose and narrative, project management, subtitle files, credits, marketing copy, documentation, and so on), and the power of structured remote backups is leveraged by the artists. Installing Git-portal --------------------- Git-portal is currently a manual install from its home on Gitlab. It's just a Bash script and some Git hooks (which are themselves, in this case, Bash scripts), but it requires a quick build process nevertheless so that it knows where to install itself. $ git clone https://gitlab.com/slackermedia/git-portal.git git-portal.clone $ cd git-portal.clone $ ./configure $ make $ sudo make install Using Git-portal ---------------- Git-portal is used alongside Git. This means there are some added steps to remember, but you only need Git-portal when dealing with your media assets, so it's pretty easy to remember unless you've managed to acclimate yourself to treating large files the same text files (which is rare for Git users). There's also one setup step that you must do when deciding to use Git-portal in a project. $ mkdir bigproject.git $ cd !$ $ git init $ git-portal init The `init` function of Git-portal creates a `_portal` directory in your Git repository, and adds it to your `.gitignore` file. Using Git-portal in a daily routine integrates smoothly with Git itself. A good example is a mostly MIDI-based music project: the project files produced by the music workstation are text-based, but the MIDI files are binary data. $ ls -1 _portal song.1.qtr song.qtr song-Track_1-1.mid song-Track_1-3.mid song-Track_2-1.mid $ git add song*qtr $ git-portal song-Track*mid $ git add song-Track*mid If you look into the `_portal` directory, you'll find the original MIDI files. The files in their place are symlinks to `_portal`, which keeps the music workstation working as expected. $ ls -lG [...] _portal/ [...] song.1.qtr [...] song.qtr [...] song-Track_1-1.mid -> _portal/song-Track_1-1.mid* [...] song-Track_1-3.mid -> _portal/song-Track_1-3.mid* [...] song-Track_2-1.mid -> _portal/song-Track_2-1.mid* As with Git, you can also add a directory of files: $ cp -r ~/synth-presets/yoshimi . $ git-portal add yoshimi Directories cannot go through the portal. Sending files instead. $ ls -lG _portal/yoshimi [...] yoshimi.stat -> ../_portal/yoshimi/yoshimi.stat* Removal works as expected, but when removing something in `_portal`, you should use the `git-portal rm` instead of `git rm`. Using Git-portal ensures that the file is removed from `_portal`. $ ls _portal/ song.qtr song-Track_1-3.mid@ yoshimi/ song.1.qtr song-Track_1-1.mid@ song-Track_2-1.mid@ $ git-portal rm song-Track_1-3.mid rm 'song-Track_1-3.mid' $ ls _portal/ song-Track_1-1.mid* song-Track_2-1.mid* yoshimi/ If you forget to use Git-portal, then you must remove the portal file manually. $ git-portal rm song-Track_1-1.mid rm 'song-Track_1-1.mid' $ ls _portal/ song-Track_1-1.mid* song-Track_2-1.mid* yoshimi/ $ trash _portal/song-Track_1-1.mid Git-portal's only other function is to list all current symlinks and to find any that may have become broken, which can sometimes happen if files move around in a project directory. $ mkdir foo $ mv yoshimi foo $ git-portal status bigproject.git/song-Track_2-1.mid: symbolic link to _portal/song-Track_2-1.mid bigproject.git/foo/yoshimi/yoshimi.stat: broken symbolic link to ../_portal/yoshimi/yoshimi.stat If you're using Git-portal for a personal project and you're maintaining your own backups, then that's technically all you need to know about Git-portal. If you want to add in collaborators, though, or you want Git-portal to manage backups for you the way (more or less) Git does, then you can a remote. Git-portal remotes ------------------ Adding a remote location for Git-portal is done through Git's existing remote function. Git-portal implements Git hooks, scripts hidden away in your repository's `.git` directory, to look at your remotes for any remote with a name beginning with `_portal`. If it finds one, then it attempts to `rsync` to the remote location and synchronize files. Git-portal performs this action any time you do a Git push or a Git merge (or pull, which is really just a fetch and an automatic merge). If you've only cloned Git repositories, then you may never have added a remote yourself. It's a standard Git procedure: $ git remote add origin git@gitdawg.com:seth/bigproject.git $ git remote -v origin git@gitdawg.com:seth/bigproject.git (fetch) origin git@gitdawg.com:seth/bigproject.git (push) The name `origin` is a popular convention for your main Git repository, so it makes sense to use it for your Git data. Your Git-portal data, however, gets stored separately, so you must create a second remote to tell Git-portal where to push and pull from. Depending on your Git host, you may need a separate server entirely, because gigabytes of media assets aren't likely to be accepted by a Git host with limited space, or maybe you're on a server that only permits you to access your Git repository and not external storage directories. $ git remote add _portal seth@example.com:/home/seth/git/bigproject_portal $ git remote -v origin git@gitdawg.com:seth/bigproject.git (fetch) origin git@gitdawg.com:seth/bigproject.git (push) _portal seth@example.com:/home/seth/git/bigproject_portal (fetch) _portal seth@example.com:/home/seth/git/bigproject_portal (push) You may not want to give all your users individual accounts on your server, and you don't have to. To provide access to the server hosting a repository's large file assets, you can run a Git front-end like [Gitolite](LINK TO MY SHARING WITH GIT ARTICLE), or you can use `rrsync` (restricted `rsync`). Now you can push your Git data to your remote Git repository and your Git-portal data to your remote portal: $ git push origin HEAD master destination detected Syncing _portal content... sending incremental file list sent 9,305 bytes received 18 bytes 1,695.09 bytes/sec total size is 60,358,015 speedup is 6,474.10 Syncing _portal content to example.com:/home/seth/git/bigproject_portal Any user with Git-portal installed and with a `_portal` remote configured has their `_portal` directory synchronized, getting new content from the server and sending fresh content with every push. While it's not required to do a Git commit and push to sync with the server (a user could just use rsync directly), I find it useful to require commits for artistic changes. It integrates artists and their digital assets into the rest of the workflow, and it provides useful metadata about project progress and velocity. Other options ============= There are other options for managing large files with Git, and if Git-portal is too simple for you, then they're worth looking into. [Git LFS](https://git-lfs.github.com/) is a fork of a defunct project called Git-media and is maintained and supported by Github. It also requires special commands (like `git lfs track` to protect large files from being tracked by Git), and requires the user to manage a `.gitattributes` file to update what files in the repository are tracked by LFS. It supports *only* HTTP and HTTPS remotes for its large files, however, so your LFS server must be configured so that users can authenticate over HTTP rather than SSH or rsync. A third option is Git-annex, which I explain in detail in my article about [managing binary blobs in Git](https://opensource.com/life/16/8/how-manage-binary-blobs-git-part-7) (ignore the parts about git-media, which is deprecated and its former flexibility doesn't apply to its successor, Git LFS). Git-annex is a flexible and elegant solution with a detailed system for adding, removing, and moving large files within a repository. Because it's flexible and powerful, there are lots of new commands and rules to learn, so take a look at its [documentation](https://git-annex.branchable.com/walkthrough/). If, however, your needs are simple and you like a solution that utilizes existing technology to do simple and obvious tasks, Git-portal might be the tool for the job. Have fun!