TITLE: Writing a Makefile for LaTeX and R
DATE: 2019-08-08
AUTHOR: John L. Godlee
====================================================================


I've experimented in the past with automating the building of 
scientific reports, mostly just using shell scripts which run every 
script and use pdflatex to compile the final document, in order of 
dependencies, but after I finished writing my most recent 
manuscript I vowed to learn how to use make.

make is a good way to handle dependencies during software 
compilation, but I figured it can probably be used to compile 
scientific research as well. A nice feature is that it will only 
run parts of the compilation which need to be run every time, as 
long as the Makefile is set up properly to recognise dependencies.

Unfortunately a lot of the online tutorials for using make rely on 
examples using C code, which isn't something I'm familiar with, and 
besides, my use case is slightly different. I did find a couple of 
useful resources, namely This blog post by Rob J Hyndman and a more 
philosophical blog post by Zachary M. Jones. Also this blog post 
which offered some inspiration on compiling large documents like a 
whole PhD thesis.

  [This blog post by Rob J Hyndman]: 
https://robjhyndman.com/hyndsight/makefiles/
  [Zachary M. Jones]: http://zmjones.com/make/
  [this blog post]: 
https://scaron.info/blog/makefiles-for-latex.html

I found that the best way to make an efficient and fool-proof 
Makefile was to modularise the pieces of the puzzle as much as 
possible. This meant splitting up R scripts so that each script 
only creates a single plot or table of the same name, and putting 
those scripts in directories grouped by how they will be parsed by 
the Makefile, e.g. all tables in a directory called tab/. Although 
I didn't need to do it in my example instance, modularising TeX 
code might be useful as well if I'm working on a big document.

For my example, as a reminder to myself of best practices, I made a 
directory with some example files in it, in a directory tree like 
this:

    .
    ├── Makefile
    ├── agsmnourl.bst
    ├── analysis
    │   ├── fig
    │   │   ├── fig_1.R
    │   │   └── fig_2.R
    │   └── tab
    │       └── tab_1.R
    ├── fig
    ├── tab
    ├── test.bib
    └── test.tex

At the top level I have the Makefile, test.tex and test.bib which 
have the text and references for my report, respectively. Below 
that I have a directory containing R scripts called analysis with 
subdirectories grouped by what the output type of the R script is. 
I also have currently empty directories which will eventually hold 
Figures (fig) and LaTeX formatted tables (tab) after the Makefile 
has run.

This is the Makefile I came up with:

    # LaTeX Makefile

    # Basic TeX file prefix
    PROJ = test

    # R input paths for figures and tables
    RIPATH = analysis/fig
    RTPATH = analysis/tab

    # Output paths for generated figures and tables
    IPATH = img
    TPATH = tab

    # Gather files from input paths
    RIFILES = $(wildcard $(RIPATH:=/*.R))
    RTFILES = $(wildcard $(RTPATH:=/*.R))

    # Create paths of output .pdf files by changing suffix from .R 
to .pdf
    # and prefix from `analysis` (RPATH) to `img` (IPATH)
    # These files don't exist yet but the list of files in FIGS is 
needed 
    # as a dependency for $(PROJ).pdf  
    FIGS = $(subst $(RIPATH), $(IPATH), $(RIFILES:.R=.pdf))

    # Create paths of output table files by changing suffixes and 
prefixes,
    # same as above
    TABS = $(subst $(RTPATH), $(TPATH), $(RTFILES:.R=.tex))

    # Main 
    all: $(PROJ).pdf 

    # Create pdf
    $(PROJ).pdf: $(PROJ).tex $(FIGS) $(TABS)
        latexmk -pdf -quiet -bibtex $(PROJ).tex

    # Create figures
    $(IPATH)/%.pdf: $(RIPATH)/%.R
        Rscript $<

    # Create tables
    $(TPATH)/%.tex: $(RTPATH)/%.R
        Rscript $<

    # Remove generated latex files and generated figures and tables
    clean:
        latexmk -C 
        rm -f $(IPATH)/*.pdf
        rm -f $(TPATH)/*.tex

PROJ = test defines the prefix for the TeX document which will be 
compiled.

RIPATH = analysis/fig and RTPATH = analysis/tab define paths where 
R scripts are located. The reason these two are split up into 
different variables is that I will use suffix replacement to create 
.pdf and .tex files with the same names as the scripts later on, in 
two different target-dependency operations.

IPATH = img and TPATH = tab define ouput paths where the generated 
images and tables will be stored.

RIFILES = $(wildcard $(RIPATH:=/*.R)) and RTFILES = $(wildcard 
$(RTPATH:=/*.R)) create variables holding the names of R scripts 
which will be transformed later to variables holding paths to files 
which don't yet exist as they haven't been created, but are 
necessary to give as dependencies to the .pdf target.

FIGS = $(subst $(RIPATH), $(IPATH), $(RIFILES:.R=.pdf)) and TABS = 
$(subst $(RTPATH), $(TPATH), $(RTFILES:.R=.tex)) Transform the 
paths generated in the previous lines (RIFILES and RTFILES) into 
./img/*.pdf and ./tab/*.tex paths, respectively.

all: $(PROJ).pdf is the top level dependency for the Makefile, this 
is the goal of the make process, to generate this file. Note that 
$(PROJ) will be expanded to test.pdf.

$(PROJ).pdf: $(PROJ).tex $(FIGS) $(TABS) defines the dependencies 
for test.pdf which are test.tex, all the files in $(FIGS) and 
$(TABS). The command for that target is latexmk -pdf -quiet -bibtex 
$(PROJ).tex, which uses latexmk to run pdflatex and bibtex enough 
times to solve all cross-references and create a full test.pdf.

One thing that took me a long time to get my head around is that it 
doesn't matter what the file name is in the target section of the 
Makefile, the output of the command doesn't have to be equal to the 
file given in the target as the command never sees the target. It's 
only a convention for making coherent Makefiles and to mae sure 
that missing files trigger the Makefile correctly.

$(IPATH)/%.pdf: $(RIPATH)/%.R and the similar line to create tables 
takes the file list in $(RTPATH) and uses Rscript $< to run each 
script in turn, using the $< operator, which expands to the file 
given in the dependencies section of the target definition. Note 
the use of % to substitute .R with .tex. This is why it was 
important to create two lists of files, one for pdf images and one 
for tex tables.

clean: and its commands wipes the slate clean when make clean is 
called from the terminal. It removes all files generated by latexmk 
and removes all the .pdf images and .tex tables.