Benchmarking nearest-neighbor regridding
In this notebook we compare the results of three nearest-neighbor regridding methods:
The CDO (Climate Data Operators) command line program.
The xESMF xarray extension (based on the Earth System Modelling Framework).
xarray’s
interpmethod, wrapped as an accessor byxarray-regrid.
The data to resample is the ERA5 monthly dataset, for the dewpoint temperature.
We start with importing the required modules, starting up Dask and loading in the data:
from time import time
import dask.distributed
import xarray_regrid # Importing this will make Dataset.regrid accessible.
import xarray as xr
from xarray_regrid import Grid
import xesmf as xe
client = dask.distributed.Client()
ds = xr.open_dataset(
"data/era5_2m_dewpoint_temperature_2000_monthly.nc",
)
Both xarray-regrid and xESMF require a dataset to resample to. For this you can also use an already existing dataset with the desired grid.
Here we use the Grid dataclass from xarray-regrid to create the dataset.
new_grid = Grid(
north=90,
east=180,
south=45,
west=90,
resolution_lat=0.17,
resolution_lon=0.17,
)
target_dataset = new_grid.create_regridding_dataset()
target_dataset
<xarray.Dataset> Size: 6kB
Dimensions: (latitude: 265, longitude: 530)
Coordinates:
* latitude (latitude) float64 2kB 45.0 45.17 45.34 ... 89.54 89.71 89.88
* longitude (longitude) float64 4kB 90.0 90.17 90.34 ... 179.6 179.8 179.9
Data variables:
*empty*xarray-regrid
With xarray-regrid you can access the regridding utilities by using Dataset.regrid.
To make the comparison fair the lazy dataset is computed, to actually have the regridding performed.
t0 = time()
data_regrid = ds.regrid.nearest(target_dataset)
data_regrid = data_regrid.compute()
print(f"Elapsed time: {time() - t0:.3f} seconds")
Elapsed time: 0.208 seconds
CDO
The CDO data was regridded using the terminal.
To ensure that the regridding is most accurate, the dataset was first converted to 64-bit floats:
cdo -b 64 -copy era5_2m_dewpoint_temperature_2000_monthly.nc era5_2m_dewpoint_temperature_2000_monthly_64b.nccdo -b 64 -copy new_grid.nc new_grid_64b.nc
Next, the data can be regridded:
cdo remapnn,new_grid_64b.nc era5_2m_dewpoint_temperature_2000_monthly_64b.nc cdo_nearest_64b.nc
data_cdo = xr.open_dataset("data/cdo_nearest_64b.nc")
xESMF
For xESMF the regridding weights are first computed (xe.Regridder).
After that, the regridder can be used to regrid the dataset:
t0 = time()
regridder = xe.Regridder(ds, target_dataset, "nearest_s2d")
data_esmf: xr.Dataset = regridder(ds, keep_attrs=True)
print(f"Elapsed time: {time() - t0:.3f} seconds")
Elapsed time: 5.300 seconds
Comparison
Now the data has been regridded using all three available methods, we can compare the results.
The following code block computes the relative error (a-b)/a between the three datasets. All of them are identical.
from benchmark_utils import plot_comparison
plot_comparison(
data_regrid, data_esmf, data_cdo, vmin=-0.5e-10, vmax=0.5e-10, varname="d2m"
)