Benchmarking bilinear regridding
In this notebook we compare the results of three bilinear regridding methods:
The CDO (Climate Data Operators) command line program.
The xESMF xarray extension (based on the Earth System Modelling Framework).
xarray’s
interpmethod, wrapped as an accessor byxarray-regrid.
The data to resample is the ERA5 monthly dataset, for the dewpoint temperature.
We start with importing the required modules, starting up Dask and loading in the data:
from time import time
import dask.distributed
import xarray_regrid # Importing this will make Dataset.regrid accessible.
import xarray as xr
from xarray_regrid import Grid
import xesmf as xe
client = dask.distributed.Client()
ds = xr.open_dataset(
"data/era5_2m_dewpoint_temperature_2000_monthly.nc",
)
Both xarray-regrid and xESMF require a dataset to resample to. For this you can also use an already existing dataset with the desired grid.
Here we use the Grid dataclass and create_regridding_dataset function from xarray-regrid to create the dataset.
new_grid = Grid(
north=90,
east=180,
south=45,
west=90,
resolution_lat=0.17,
resolution_lon=0.17,
)
target_dataset = new_grid.create_regridding_dataset()
target_dataset
<xarray.Dataset>
Dimensions: (latitude: 265, longitude: 530)
Coordinates:
* latitude (latitude) float64 45.0 45.17 45.34 45.51 ... 89.54 89.71 89.88
* longitude (longitude) float64 90.0 90.17 90.34 90.51 ... 179.6 179.8 179.9
Data variables:
*empty*xarray-regrid
With xarray-regrid you can access the regridding utilities by using Dataset.regrid.
To make the comparison fair the lazy dataset is computed, to actually have the regridding performed.
t0 = time()
data_regrid = ds.regrid.linear(target_dataset)
data_regrid = data_regrid.compute()
print(f"Elapsed time: {time() - t0:.3f} seconds")
Elapsed time: 0.070 seconds
data_regrid
<xarray.Dataset>
Dimensions: (time: 12, latitude: 265, longitude: 530)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-02-01 ... 2000-12-01
* latitude (latitude) float64 45.0 45.17 45.34 45.51 ... 89.54 89.71 89.88
* longitude (longitude) float64 90.0 90.17 90.34 90.51 ... 179.6 179.8 179.9
Data variables:
d2m (time, latitude, longitude) float64 253.4 253.3 ... 249.1 249.1
Attributes:
Conventions: CF-1.6
history: 2023-06-19 14:51:03 UTC by era5cli 1.4.0: reanalysis-era5-s...CDO
The CDO data was regridded using the terminal.
To ensure that the regridding is most accurate, the dataset was first converted to 64-bit floats:
cdo -b 64 -copy era5_2m_dewpoint_temperature_2000_monthly.nc era5_2m_dewpoint_temperature_2000_monthly_64b.nccdo -b 64 -copy new_grid.nc new_grid_64b.nc
Next, the data can be regridded:
cdo remapbil,new_grid_64b.nc era5_2m_dewpoint_temperature_2000_monthly_64b.nc cdo_bilinear_64b.nc
data_cdo = xr.open_dataset("data/cdo_bilinear_64b.nc")
xESMF
For xESMF the regridding weights are first computed (xe.Regridder).
After that, the regridder can be used to regrid the dataset:
t0 = time()
regridder = xe.Regridder(ds, target_dataset, "bilinear")
data_esmf: xr.Dataset = regridder(ds, keep_attrs=True)
print(f"Elapsed time: {time() - t0:.3f} seconds")
hwloc/linux: Ignoring PCI device with non-16bit domain.
Pass --enable-32bits-pci-domain to configure to support such devices
(warning: it would break the library ABI, don't enable unless really needed).
Elapsed time: 9.683 seconds
Comparison
Now the data has been regridded using all three available methods, we can compare the results.
The following code block computes the relative error (a-b)/a between the three datasets. As you see, the method xarray-regrid uses has (near) identical results compared to the CDO regridding (only floating point errors of ~1e-8). xESMF seems to (subtly) deviate from the CDO and xarray-regrid results.
from benchmark_utils import plot_comparison
plot_comparison(
data_regrid, data_esmf, data_cdo, vmin=-0.5e-5, vmax=0.5e-5, varname="d2m"
)