TITLE: Querying the SoilGrids REST API
DATE: 2020-11-08
AUTHOR: John L. Godlee
====================================================================


This week I have been working on the SEOSAW database, which holds 
tree measurement data from around 7000 survey plots in southern 
Africa. The database is always difficult to deal with, mostly 
because the component datasets were all collected with different 
initial purposes, meaning there is a significant amount of data 
bashing to get each one into the SEOSAW format. I chose to use an R 
package to hold functions and workflows for cleaning the datasets 
and compiling the main database. In previous versions of the 
package I had a number of reduced size spatial data layers held as 
data objects in the package, which I could then query for 
information at each plot such as estimated herbivore biomass, 
elevation, and administrative region. I've since removed all of 
these functions, because they were too temperamental and the 
trade-off in spatial resolution that came from storing reduced size 
versions of each data layer in the package was too great. Now the 
data layers are queried at full resolution in a separate process 
right at the end of dataset compilation.

  [SEOSAW database]:

One of the functions which I was actually quite proud of was one 
that queried the SoilGrids REST API to get soil information. The 
function also provided functionality for averaging over soil 
depths. The SoilGrids REST API is still experimental, and since I 
first wrote the function they upgraded to v2.0. This broke the 
function and so it didn't work for a long time.

  [SoilGrids]: https://www.isric.org/
  [REST API]: https://rest.soilgrids.org/soilgrids/v2.0/docs

I re-wrote the function yesterday and thought I would post it here, 
as it probably won't get any use elsewhere. I also wrote a 
companion function which sends an empty query to get back all 
possible combinations of soil attributes, depths, and values (mean, 
Q0.05 etc.).

    #' Query SoilGrids via the REST API 
    #' 
    #' See https://rest.soilgrids.org/soilgrids/v2.0/docs for more 
information
    #' 
    #' @param x dataframe of plot level data
    #' @param plot_id column name string of plot IDs
    #' @param longitude_of_centre column name string of plot 
longitude
    #' @param latitude_of_centre column name string of plot latitude
    #' @param attrib vector of soil attributes to extract
    #' @param depth vector of soil depths over which to extract the 
attributes, e.g. 0-5, 30-60
    #' @param average optional list of vectors, each of length 2, 
describing ranges 
    #'     of depths over which to average (mean) the values of 
attrib
    #' @param value vector of soil values to extract for each 
attribute, e.g. mean, Q0.05 
    #' 
    #' @details See \code{soilQueryOptions()} for all possibly 
combinations of 
    #'     attrib, depth, and value
    #'
    #' @return dataframe of soil values by depth
    #' 
    #' @importFrom httr GET content
    #' @export
    #' 
    soilQuery <- function(x, plot_id = "plot_id", 
      longitude_of_centre = "longitude_of_centre", 
      latitude_of_centre = "latitude_of_centre", 
      attrib = c("cec", "cfvo", "clay", "nitrogen", "ocd", "ocs", 
        "phh20", "sand", "silt", "soc"),
      depth = c("0-5", "5-15", "15-30", "30-60", "60-100", 
"100-200"), 
      average = list(c(0,30)), 
      value = "mean") {

      default_attrib <- c("bdod", "cec", "cfvo", "clay", 
"nitrogen", "ocd", "ocs", 
        "phh20", "sand", "silt", "soc")
      default_depth <- c("0-5", "5-15", "15-30", "30-60", "60-100", 
"100-200")
      depth_values <- unique(unlist(strsplit(default_depth, "-")))
      default_value <- c("Q0.05", "Q0.5", "Q0.95", "mean", 
"uncertainty")

      # Check input
      if (any(!attrib %in% default_attrib)) {
        stop("Illegal soil attribute: ",
          paste(attrib[!attrib %in% default_attrib], collapse = ", 
"),
          "\n\tAllowed values: ", 
            paste(default_attrib, collapse = ", "),
          "\n\tRun `soilQueryOptions()` to find valid attributes")
      }
      if (any(!depth %in% default_depth)) {
        stop("Illegal depth: ",
          paste(depth[!depth %in% default_depth], collapse = ", "),
          "\n\tallowed values: ", 
          paste(default_depth, collapse = ", "),
          "\n\tRun `soilQueryOptions()` to find valid depths")
      }
      if (!is.null(average)) {
        if (any(!unlist(average) %in% depth_values)) {
        stop("Illegal averaging depth: ",
            paste(average[!unlist(average) %in% depth_values], 
collapse = ", "),
            "\n\tallowed values: ", 
            paste(depth_values, collapse = ", "))
        }
        if (any(unlist(average) > max(depth_values) | 
unlist(average) < min(depth_values))) {
          stop("average range values outside sampled depths: ",
            paste(unlist(average)[unlist(average) > 
max(depth_values) | unlist(average) < min(depth_values)], 
              collapse = ", "))
        }
        if (any(sapply(average, function(y) {
              length(y) != 2
            }))) {
          stop("All average ranges must be vectors of length 2")
        }
      }
      if (any(!value %in% default_value)) {
        stop("Illegal value: ",
            paste(value[!value %in% default_value], collapse = ", 
"),
            "\n\tallowed values: ", 
            paste(default_value, collapse = ", "),
          "\n\tRun `soilQueryOptions()` to find valid values")
      }

      # Construct query
      attrib_string <- paste0(paste0("&property=", attrib), 
collapse = "")
      depth_string <- paste0(paste0("&depth=", depth, "cm"), 
collapse = "")
      value_string <- paste0(paste0("&value=", value), collapse = 
"")

      queries <- lapply(1:nrow(x), function(y) {
        call <- 
paste0("https://rest.soilgrids.org/soilgrids/v2.0/properties/query?"
, 
          "lon=", x[y, longitude_of_centre], 
          "&lat=", x[y, latitude_of_centre], 
          attrib_string, 
          depth_string,
          value_string)
      })
      
      # Run GET query
      query_get <- lapply(queries, httr::GET)

      # If any queries fail, end function
      if (any(unlist(lapply(query_get, `[[`, "status_code")) != 
200)) {
        stop("Some queries failed")
      }

      # Flatten to list
      query_list <- lapply(query_get, httr::content, as = "parsed")

      # For each query, for each attrib, for each depth, for each 
value, extract values
      query_extract <- lapply(query_list, function(y) {
        lapply(y$properties$layers, function(z) {
          lapply(z$depths, function(v) {
            lapply(v$values, function(w) {
              w 
            })
          })
        })
      })

      # Make tidy dataframe of values
      extract_df <- data.frame(
        plot_id = rep(x[[plot_id]], each = length(attrib) * 
length(depth) * length(value)),
        attrib =  rep(rep(attrib, each = length(depth) * 
length(value)), times = length(x[[plot_id]])),
        depth =   rep(rep(rep(depth, each = length(value)), times = 
length(x[[plot_id]])), times = length(attrib)),
        value =   rep(rep(rep(value, times = length(x[[plot_id]])), 
times = length(attrib)), times = length(depth)),
        extract = unlist(query_extract))

      # Optionally average each value and attrib over depths
      if (!is.null(average)) {
        extract_split <- split(extract_df, 
          list(extract_df$plot_id, extract_df$attrib, 
extract_df$value))

        extract_df <- do.call(rbind, lapply(extract_split, 
function(y) {
          do.call(rbind, lapply(average, function(z) {
            y$min_depth <- sapply(strsplit(y$depth, "-"), `[`, 1)
            y$max_depth <- sapply(strsplit(y$depth, "-"), `[`, 2)
            out_df <- data.frame(plot_id = y[1,plot_id],
              attrib = y[1, "attrib"],
              depth = paste(y$min_depth[which(y$min_depth == 
z[1])], y$max_depth[which(y$max_depth == z[2])], sep = "-"),
              value = y[1, "value"],
              extract = mean(y$extract[which(y$min_depth == z[1]) : 
which(y$max_depth == z[2])],
              na.rm = TRUE))
          }))
        }))
        row.names(extract_df) <- NULL
      }

      return(extract_df)
    }

The function to get all possible input options:

    #' Query SoilGrids via the REST API 
    #' 
    #' See https://rest.soilgrids.org/soilgrids/v2.0/docs for more 
information
    #'
    #' @return Dataframe of all soil attributes, depths and values
    #' 
    #' @importFrom httr GET content
    #' @export
    #' 
    soilQueryOptions <- function() {

      query_get <- 
httr::GET("https://rest.soilgrids.org/soilgrids/v2.0/properties/laye
rs")
      query_list <- httr::content(query_get, as = "parsed")

      out <- do.call(rbind, lapply(query_list$layers, function(x) {
        attrib <- x$property
        attrib_df <- do.call(rbind, lapply(x$layer_structure, 
function(y) {
          depth <- y$range
          value <- unlist(y$values)
          data.frame(depth, value)
        }))
        cbind(attrib = attrib, attrib_df)
      }))
      
      return(out)
    }

A test of soilQuery() for three locations in Africa:

    x <- data.frame(plot_id = c("A", "B", "C", "D"), 
      longitude_of_centre = c(22.674 , 15.555, 32.122, -1.319738),
      latitude_of_centre = c(7.140, -14.736, -6.231, 53.065368))

    attrib = c("cec", "clay", "nitrogen") 
    depth = c("0-5", "5-15", "15-30", "60-100")
    average = list(c(0,30), c(0,15), c(60,100))
    value = c("mean", "Q0.5")

    soilQuery(x, attrib = attrib, depth = depth, average = average, 
value = value)

  plot_id   attrib     depth    value   extract
  --------- ---------- -------- ------- ---------
  A         cec        0-30     mean    102.00
  A         cec        0-15     mean    106.00
  A         cec        60-100   mean    100.00
  B         cec        0-30     mean    83.00
  B         cec        0-15     mean    84.50
  B         cec        60-100   mean    72.00
  C         cec        0-30     mean    53.00
  C         cec        0-15     mean    56.50
  C         cec        60-100   mean    44.00
  D         cec        0-30     mean    152.33
  D         cec        0-15     mean    173.00
  D         cec        60-100   mean    94.00
  A         clay       0-30     mean    187.00
  A         clay       0-15     mean    161.50
  A         clay       60-100   mean    335.00
  B         clay       0-30     mean    85.67
  B         clay       0-15     mean    73.50
  B         clay       60-100   mean    321.00
  C         clay       0-30     mean    108.00
  C         clay       0-15     mean    110.50
  C         clay       60-100   mean    265.00
  D         clay       0-30     mean    195.33
  D         clay       0-15     mean    211.00
  D         clay       60-100   mean    111.00
  A         nitrogen   0-30     mean    97.00
  A         nitrogen   0-15     mean    110.50
  A         nitrogen   60-100   mean    45.00
  B         nitrogen   0-30     mean    48.00
  .         ...        ...      ...     ...