How to handle a NetCDF file?

NetCDF file

NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. NetCDF files are commonly used in climatology, meteorology and oceanography applications (e.g., weather forecasting, climate change) and GIS applications. They are self-describing because they contain metadata that describes what is contained in a file, such as the latitude and longitude layout of the grid, the names and units of variables in the data set, and “attributes” that describe things like missing value codes, or offsets and scale factors that may have been used to compress the data. Machine-independent because NetCDF files can be transferred among servers and computers that are running different operating systems, without having to convert the files in some way.

Use of NetCDF files in ecological modeling

In ecological modeling, NetCDF files are often used to store large datasets of environmental variables, such as temperature, precipitation, and soil moisture, that are used as inputs for modeling simulations.

The primary benefit of using NetCDF files in ecological modeling is their ability to store and manage large and complex datasets. This format provides a way to store multidimensional data, including metadata, in a self-describing file that can be read by a variety of software tools. This makes it easier to share data between different models and teams, reducing the risk of data loss or corruption during the transfer process.

In addition to storage and management, NetCDF files also provide a standardized way to represent data, making it easier to compare and analyze data from different sources. This standardized representation can also help ensure data consistency and reduce the risk of errors in the modeling process.

Another benefit of using NetCDF files in ecological modeling is their ability to store data in a highly compressed format, reducing the size of the data and making it faster to transfer and work with. This can be especially important when working with large datasets that can take up a significant amount of memory and storage space.

In terms of accessibility, NetCDF files can be read and manipulated using a variety of programming languages, including R, Python, and MATLAB. This makes it possible for researchers and practitioners to choose the tools and programming languages that are best suited for their needs, without having to worry about compatibility issues.

NetCDF files also provide a way to store metadata, including information about the data and its source, that can be used to validate and interpret the results of ecological modeling simulations. This metadata is important because it helps to ensure that the data is accurate and reliable, and provides context for interpreting the results of the models.

NetCDF files offer a number of benefits for ecological modeling, including storage and management of large and complex datasets, standardized data representation, data compression, accessibility, and the ability to store metadata. These benefits make NetCDF a popular and widely used format for storing and exchanging data in ecological modeling and other scientific disciplines.

I started dealing with NetCDF files when I was looking for meteorological input data to run a 2D simulation over the Sahel. I found the multidimensional structure of netCDF files very helpful to prepare input data for 2D models. There are already some R packages available to perform a range of operations on NetCDF files. We have for example ncdf and ncdf4 packages that can be used to read, analyse and write netCDF files. The ncdf4 package is available on both Windows and Mac OS X (and Linux), and supports both the older NetCDF3 format as well as netCDF4. See http://cirrus.ucsd.edu/~pierce/ncdf/index.html.

Download global data from Copernicus AgERA5 dataset

AgERA5 dataset is based on the hourly ECMWF ERA5 data at surface level and provides daily input needs of most agriculture and agro-ecological models. The dataset is available at https://cds.climate.copernicus.eu/cdsapp#!/dataset/10.24381/cds.6c68c9bb?tab=overview. Get AgERA5 dataset directly from R using ag5Tools R package. In order to download the data, it is necessary to possess a valid CDS account along with a CDS API key. To create a file for storing your API key, please refer to the instructions provided at: https://cds.climate.copernicus.eu/api-how-to You won’t have to install Python or the cdsapi, as Ag5Tools will handle it if necessary.

# Download the global daily average 2m temperature from 01-01-2021 to 31-01-2021.

# install ag5Tools pkg
# devtools::install_github("agrdatasci/ag5Tools", build_vignettes = TRUE)   

# load the package
require(ag5Tools)

# download the dataset
ag5_download(variable = "2m_temperature", # the variable of interest
             statistic = "24_hour_mean",  # the statistic to perform on the data
             day = "all",                 # select days of the month
             month = 1,                   # select month of the year
             year = 2021,                 # selct year
             path = ""                    # where to store the files 
             ) 
# the data will be stored in a folder named 2021 in your path

Subsetting a Netcdf file

From the global dataset, let’s extract the daily average 2m temperature data for our region of interest (the Sahel). For this, we will use a shapefile (sahel_cassecs.shp) mask providing the boundaries of the region of interest

# import pkgs
require(rgdal)
require(ncdf4)
require(raster)
library(lattice)
library(RColorBrewer)

# import sahel shp mask
shp = rgdal::readOGR("3-spatialisation/save/sahel_cassecs.shp",verbose = FALSE)
proj4string(shp)<- CRS("+proj=longlat +datum=WGS84")
plot(shp)

Rplot3

var = c("Temperature-Air-2m-Mean-24h")
yr = 2021

# read the first file downloaded in the folder 2021 (Temperature-Air-2m-Mean-24h_C3S-glob-agric_AgERA5_20210101_final-v1.0.nc)
filepath = paste0(yr,"/",var,"_C3S-glob-agric_AgERA5_",yr,"0101_final-v1.0.nc")

pre0.brick = brick(filepath) # read the NetCDF file as a rasterBrick
ncin <- nc_open(filepath)    # read the NetCDF file as a nc file
print(ncin)


plot(pre0.brick) # # plot the data

Rplot2

# Get coordinate variables
lon <- ncvar_get(ncin,"lon")  # get longitude
nlon <- dim(lon)              # get number of longitude values
head(lon)

lat <- ncvar_get(ncin,"lat")  # get latitude
nlat <- dim(lat)
head(lat)
print(c(nlon,nlat))

# extract Sahel using the shapfile sahel_cassecs.shp
pre0.mask = mask(pre0.brick, shp)
plot(pre0.mask)

# convert the rasterbrick to dataframe
pre0.df = as.data.frame(pre0.mask, xy=TRUE) 

# drop  missing values
data=pre0.df[complete.cases(pre0.df), ]

# perform the same operation on the other files of the folder and combine them all into one dataframe

Date <- seq.Date(from= as.Date(paste0(yr,"-01-02")), to = as.Date(paste0(yr,"-01-31")), by = "day", drop=FALSE)
date = data.frame("date"=Date)
date$year = format(date$date,"%Y")
date$month = format(date$date,"%m")
date$day = format(date$date,"%d")
date$ymd=paste0(date$year,date$month,date$day)  

for(i in unique(date$ymd)){
  pre1.brick = brick(paste0(yr,"/",var,"_C3S-glob-agric_AgERA5_",i,"_final-v1.0.nc"))
  pre1.mask = mask(pre1.brick, shp)
  pre1.df = as.data.frame(pre1.mask, xy=TRUE)
  pre1.df=pre1.df[complete.cases(pre1.df), ] # create a dataframe without missing values
  data = cbind(data,pre1.df[3])    
}  

head(data,5)

# plot a single slice of the data (average daily temperature 2021-01-01)
levelplot(data$X2021.01.01~ data$x * data$y, data=data, cuts=5, pretty=T, 
          col.regions=(rev(brewer.pal(10,"RdBu"))),
          main="2m daily average temperature 2021-01-01 (K)")

Rplot

# Write out the dataframe as csv
write.csv(data,paste0("2021/Sahel_",var,"_",yr,".csv"))
Yélognissè Frédi Agbohessou
Yélognissè Frédi Agbohessou
Post-doctoral researcher

My research interests include Climate Change, Agroforestry, Food security in West Africa, Ecological modeling and Remote sensing.

Related