Geospatial interpolation

Interpolation

Interpolation predicts values for cells in a raster from a limited number of sample data points. It can be used to predict unknown values for any geographic point data, such as elevation, rainfall, chemical concentrations, and noise levels. The assumption that validates the effectiveness of interpolation is that objects distributed in space are spatially correlated, meaning, things that are close in space tend to have similar characteristics. For example, if it is raining on one side of the street, you can predict with great confidence that it is also raining on the other side. You would be less certain if it was raining on the other side of town and even less certain about the weather in the neighboring country. The further away things are, the less similar they are. This is the basis for interpolation.

In ecology, geospatial interpolation is used to estimate the values of environmental variables, such as temperature, precipitation, soil moisture … at locations where direct measurements are not available. This process is especially important in ecological modeling, where large datasets of environmental variables are used as inputs for simulations and predictions. There are several methods of geospatial interpolation, including inverse distance weighting (IDW), kriging, and spline interpolation. Each of these methods has its own strengths and weaknesses, and the choice of method will depend on the specific requirements of the modeling project, including the type of data being interpolated, the accuracy required, and the computational resources available.

A pratical example

We will interpolate N2O emission values from 36 sites in Senegal on a 0.05 x 0.05 degree grid. For this we will use the fake dataset (df) that I built. This database contains N2O emission values in KgN ha-1 yr-1 with their geographical coordinates.

#load needed packages
for(pkg in c("raster", # processing spatial raster data. !!! overwrites dplyr::select!!
             "sp",    # another vector data package necessary for continuity 
             "sf",    # processing spatial vector data
             "tidyverse", # wrangling tabular data and plotting
             # Different packages to test their interpolation functions 
             "gstat", # inverse distance weighted, kriging 
             "fields", # thin plate spline 
             "interp", # triangulation 
             "mgcv", # Spatial GAM : Generalized Additive Model 
             "automap", # automatic approach to kriging
             #Finally, some packages to make pretty plots 
             "viridis","ggplot2","rasterVis"
             )){
  library(pkg, character.only = TRUE)
  }

# Load the dataset df
df= read.csv("3-spatialisation/N2O_emissions_kgN_ha_yr.csv")
head(df)

# Plot data
point_plot = ggplot(
  data = df,
  mapping = aes(x = Long, y = Lat, color = N2O_emissions_kgN_ha_yr)) +
  geom_point(size = 3)+
  scale_color_gradientn(colors = c("red", "yellow","blue")
  )
point_plot

# Create a grid template

# First let's define a bounding box, a rectangle that contains all our data points. 

bbox = c (
  "xmin" = min(df$Long),
  "ymin" = min(df$Lat),
  "xmax" = max(df$Long),
  "ymax" = max(df$Lat)
  )                       # define a bounding box 

grd_template = expand.grid(
  Long = seq(from = bbox["xmin"]-1, to = bbox["xmax"]+2, by = 0.05), # 0.05 degree resolution
  Lat = seq(from = bbox["ymin"]-0.5, to = bbox["ymax"]+1, by = 0.05)
  )

grid_plot = ggplot()+
  geom_point(data = grd_template, aes(x = Long, y = Lat), size = 0.01)+
  geom_point(data = df,
             mapping = aes(x = Long, y = Lat, color =N2O_emissions_kgN_ha_yr), size = 3) + 
  scale_color_gradientn(colors = c ("red", "yellow","blue"))+
  coord_cartesian(xlim = c(-17.7, -12), ylim = c(14.5, 17))
grid_plot

# Reproject data 
crs_raster_format ="+proj=longlat +datum=WGS84 +no_defs"
sf_df = st_as_sf(df, coords = c("Long", "Lat"), crs = "+proj=longlat")
head(sf_df)

# Rasterizing the grid template (created grd_template) by creating a copy of our grid template in a different structure 
# raster expects a PROJ.4 string, see https://epsg.io/25833

grd_template_raster = grd_template %>%
  dplyr::mutate(Z=0)%>%
  raster::rasterFromXYZ(crs = crs_raster_format)
plot(grd_template_raster)

Fit interpolation models using gstat function of the gstat package

1- Nearest Neighbor interpolation: is the simplest approach to interpolation. Rather than calculate an average value by some weighting criteria or generate an intermediate value based on complicated rules, this method simply determines the “nearest” neighbouring pixel, and assumes the intensity value of it.

# Fit model
fit_NN = gstat::gstat( # using package (gstat)
  formula = N2O_emissions_kgN_ha_yr  ~ 1, 
  data = as(sf_df, "Spatial"), # using (sf) and converting to (sp), which is expected 
  nmax = 10, nmin = 3 # Number of neighboring observations used for the fit
  )

# Interpolate
interp_NN = interpolate(grd_template_raster,fit_NN)
plt = levelplot(interp_NN, cuts=6,
                main="Nearest neighbor: N2O emissions KgN/ha/yr",
                xlab="Longitude",ylab="Latitude")
plt

2- Inverse Distance Weighting: Inverse Distance Weighted Interpolation (IDW) explicitly assumes that features that are close together are more similar than those that are farther apart. To predict a value for an unmeasured location, IDW uses the measured values surrounding the prediction location. The measured values closer to the prediction location have more influence on the predicted value than those further away. IDW assumes that each measured point has a local influence that decreases with distance. It gives greater weight to points closer to the prediction location, and the weights decrease with distance, hence the name inverse distance weighting.

# Fit model
fit_IDW = gstat::gstat( 
  formula = N2O_emissions_kgN_ha_yr ~ 1,
  data = as(sf_df, "Spatial"),
  nmax = 10, nmin = 3,
  set = list(idp = 0.5) # inverse distance power
  )

# Interpolate
interp_IDW = interpolate(grd_template_raster, fit_IDW)
plt = levelplot(interp_NN, cuts=6,
                main="Nearest neighbor: N2O emissions KgN/ha/yr",
                xlab="Longitude",ylab="Latitude")
plt

3- Triangular Irregular surface: Triangular Irregular Networks (TINs) are constructed by triangulating a set of vertices (points). The vertices are connected by a series of edges to form a network of triangles. TINs can have a higher resolution in areas where a surface is highly variable or where more detail is desired and a lower resolution in less variable areas.

# Fit model
fit_TIN = interp::interp(# using (interp)
  x = df$Long, # the function actually accepts coordinate vectors
  y = df$Lat,
  z = df$N2O_emissions_kgN_ha_yr,
  xo = grd_template$Long, # here we already define the target grid
  yo = grd_template$Lat,
  output = "points"
) %>% bind_cols()

# Interpolate
inter_TIN = raster::rasterFromXYZ(fit_TIN, crs = crs_raster_format)
plt = levelplot(inter_TIN, cuts=6,
                main="TIN: N2O emissions KgN/ha/yr",
                xlab="Longitude",ylab="Latitude")
plt

4-Thin plate spline regression interpolation: Thin plate splines (TPS) are a type of smoothing spline used for visualizing complex relationships between continuous predictors and response variables. The method uses mathematical functions to model the relationship between the values of a variable at different locations. This method is often used when the data has a known pattern or structure, such as a smooth curve or surface. Spline interpolation can produce more accurate results than IDW, but it can also be more sensitive to errors in the data and may not perform well when the data is highly variable.

# Fit model
fit_TPS = fields::Tps( # Using (fields)
  x = as.matrix(df[, c("Long", "Lat")]), # accepts points but expects them as matrix 
  Y = df$N2O_emissions_kgN_ha_yr, # the dependent variable
  miles = FALSE  # EPSG  4326 is based in degree
)

# Interpolate
interp_TPS = interpolate(grd_template_raster,fit_TPS)
plt = levelplot(interp_TPS, cuts=6,
                main="TPS: N2O emissions KgN/ha/yr",
                xlab="Longitude",ylab="Latitude")
plt

5- Kriging is a more advanced method of geospatial interpolation that takes into account the spatial autocorrelation of the data, or the relationship between the values of the variable at different locations. This method uses statistical models to make predictions about the value of a variable at a location based on its values at surrounding locations. Kriging is generally considered to be more accurate than IDW, but it can also be more computationally intensive and complex to implement.

Regardless of the method chosen, geospatial interpolation is a critical component of many ecological models and plays an important role in understanding and predicting the behavior of ecosystems and the impact of environmental change.

Yélognissè Frédi Agbohessou
Yélognissè Frédi Agbohessou
Post-doctoral researcher

My research interests include Climate Change, Agroforestry, Food security in West Africa, Ecological modeling and Remote sensing.

Related