TITLE: Calculating NDVI from Sentinel 2 images for Bicuar National Park
DATE: 2018-11-20
AUTHOR: John L. Godlee
====================================================================


I recently ventured into trying to make sense of sentinel 2 data,
multispectral remote sensing imagery. I wanted to calculate NDVI for
Bicuar National Park, so I could see whether it’s possible to
identify areas of miombo woodland within the park using variation in
the NDVI, which you would expect is higher in woodland and lower in
grassland.

I got some cloud free images for the area covering Bicuar and wrote
a Python script which calculates NDVI, from the red band and near
infra-red band:

    # Import libraries
    import glob
    import gdal 
    import os
    import fnmatch
    import re
    import cv2

    # Define a function to find files given a pattern
    def find(pattern, path):
        result = []
        for root, dirs, files in os.walk(path):
            for name in files:
                if fnmatch.fnmatch(name, pattern):
                    result.append(os.path.join(root, name))
        return result

    # Set working directory for images
    rootdir = '/sentinel_bicuar'
    os.chdir(rootdir)

    # Create a list of folders 
    folders = next(os.walk(os.getcwd()))[1]

    # Loop through each folder in turn
    for i in folders:
      # Set input directory
      in_dir =  rootdir +  '/' + i

      # Search directory for desired bands
      red_file = find('*B04.jp2', in_dir)[0]
      print("Processing: " + red_file)
      nir_file = find('*B08.jp2', in_dir)[0]

      # Open each band using gdal
      red_link = gdal.Open(red_file)
      nir_link = gdal.Open(nir_file)

      # Store as an array
      red_array = red_link.GetRasterBand(1).ReadAsArray() * 0.0001
      nir_array = nir_link.GetRasterBand(1).ReadAsArray() * 0.0001
     
      # Create a mask filled with zeroes
      mask = red_array == 0.

      # Calculate NDVI 
      ndvi2 = (nir_array - red_array) / (nir_array + red_array)

      # Set mask values back to 0
      ndvi2[mask] = 0.

      # Create output filename based on input name 
      out_string_a = re.search('A004323_(.*)/IMG_DATA', red_file).group(1)
      out_string_b = re.search('IMG_DATA/(.*)_B04.', red_file).group(1)
      out_file = rootdir + '/' + out_string_a + '_' + out_string_b + '_NDVI.tif'
      print('Creating file: ' + out_file)

      # Get dimensions
      x_pixels = ndvi2.shape[0] # number of pixels in x
      y_pixels = ndvi2.shape[1] # number of pixels in y

      # Set up output GeoTIFF
      driver = gdal.GetDriverByName('GTiff')

      # Create driver using output filename, x and y pixels, # of bands, and datatype
      ndvi_data = driver.Create(out_file,x_pixels, y_pixels, 1, gdal.GDT_Float32)

      # Set nodata value
      ndvi_data.GetRasterBand(1).SetNoDataValue(0.)

      # Set NDVI array as the 1 output raster band
      ndvi_data.GetRasterBand(1).WriteArray(ndvi2)

      # Setting up the coordinate reference system of the output GeoTIFF
      geotrans=red_link.GetGeoTransform() # Grab input GeoTranform information
      print(geotrans)
      proj=red_link.GetProjection() # Grab projection information from input file

      # now set GeoTransform parameters and projection on the output file
      ndvi_data.SetGeoTransform(geotrans)
      ndvi_data.SetProjection(proj)
      ndvi_data.FlushCache()
      ndvi_data=None

      print("DONE")

Then I use gdal to merge each of the resultant .tif files with an
NDVI band into a single file, then clip that file with the outline
of [Bicuar National Park].

  [Bicuar National Park]: https://www.protectedplanet.net/#thematic-areas

    #!/bin/bash

    echo "Merging tif files"

    gdal_merge.py -n 0 -a_nodata 0 *_NDVI.tif -o ndvi_merge_o.tif

    gdalwrap -t_srs '+proj=longlat +datum=WGS84' ndvi_merge_0.tif ndvi_merge_0_longlat.tif

    gdalwarp -cutline  'bicuar_shp/WDPA_Mar2018_protected_area_350-shapefile-polygons.shp' -crop_to_cutline -dstalpha ndvi_merge_0_longlat.tif ndvi_merge_0_longlat_bicuar.tif

  {IMAGE}


Then I can use an R script to look at the distribution of NDVI
across the park

    # Packages
    library(raster)
    library(rgdal)


    # Import data ----
    ndvi_tif_bicuar <- raster("ndvi_merge_0_longlat_bicuar.tif")

    ndvi_vec <- getValues(ndvi_tif_bicuar)

    hist(ndvi_vec, breaks = 100)

  {IMAGE}


I can also experiment with plotting areas of the park within a
certain threshold of NDVI

    ndvi_thresh <- ndvi_tif_bicuar[ndvi_tif_bicuar < 0.6] <- NA

    plot(ndvi_thresh)

  {IMAGE}