TITLE: Writing ggplot2 grobs in a loop to maintain data values
DATE: 2019-10-15
AUTHOR: John L. Godlee
====================================================================


Sometimes I encounter the need to make multiple plots in a loop and
then arrange them in a grid.arrange() panel. A recent example was
when I had a bunch of ANCOVA linear models in a list, each with a
different predictor variable, and I wanted to create ggplot()
objects for each linear model in a for loop, showing the
distribution of points for the predictor and response variable and
slopes for a categorical grouping variable, then I wanted to put
each plot into a single grid image and export it, like the image
below:

  {IMAGE}


The code to generate this involves creating a list for the plot
objects, then filling the list with each ggplot() called within a
for loop, where each iteration of the loop is a different linear
model with a different predictor variable, then calling the list of
plots with do.call("grid.arrange", c(plot_list)) to build the grid
image ready to be exported. To access the data needed for the x axis
of the plot, which changes with each plot/linear model, I took the
variable name from the linear model it is based on with:

    x_var <- rownames(summary(model_list[[i]])$coefficients)[2]

This means the ggplot() for the scatter plots can then be called
like this:

    ggplot() + 
        geom_point(aes(x = df[,x_var], y = df[,bchave_log])

You would think this works fine, but when the ggplot() object is
called again outside the loop, x_var is nowhere to be found, as it
only exists inside the loop. So instead I had to build each ggplot()
into a ggplot grob inside the loop where x_var still exists before
plotting it outside the loop. Building a grob from a ggplot() object
simply requires wrapping it in ggplot_gtable(ggplot_build(...)).
Below is an example with inbuilt data to illustrate the point:

    # Example data
    df <- mtcars

    # Run a for loop to plot each column aginst column 1
    plot_list <- list()

    for(i in 1:length(df)){
      plot_list[[i]] <- ggplot() + geom_point(aes(df[,i], df[,1]))
    }

    do.call("grid.arrange", c(plot_list, ncol = 2, nrow = 6))

    ##' All the plots are the same because `df[,i]` takes the 
    ##' value from the final loop iteration.

    # Fix the ggplot objects in the loop so they maintain variable values
    for(i in 1:length(df)){
      plot_list[[i]] <- ggplot_gtable(ggplot_build(ggplot() + 
          geom_point(aes(df[,i], df[,1]))))
    }

    do.call("grid.arrange", c(plot_list, ncol = 2, nrow = 6))

First, ggplot_build() takes the plot object and produces an object
that can be rendered as a standalone, with a list of dataframes for
each plot layer (points, lines) and a panel object containing
metadata for the plot like axis limits and themes.

Then, ggplot_gtable() builds a grob which can display the image and
stores it as a gtable. This bit is necessary so that grid.arrange()
can draw the plot.