Performance of xesmf vs xarray-regrid
Compare the two conservative methods using a moderately-sized synthetic dask dataset of about 4GB.
import dask.array as da
import xarray as xr
import xesmf
import xarray_regrid
bounds = dict(south=-90, north=90, west=-180, east=180)
source = xarray_regrid.Grid(
resolution_lat=0.25,
resolution_lon=0.25,
**bounds,
).create_regridding_dataset()
target = xarray_regrid.Grid(
resolution_lat=1,
resolution_lon=1,
**bounds,
).create_regridding_dataset()
def source_data(source, chunks, n_times=1000):
data = da.random.random(
size=(n_times, source.latitude.size, source.longitude.size),
chunks=chunks,
).astype("float32")
data = xr.DataArray(
data,
dims=["time", "latitude", "longitude"],
coords={
"time": xr.date_range("2000-01-01", periods=n_times, freq="D"),
"latitude": source.latitude,
"longitude": source.longitude,
}
)
return data
Chunking
Test “pancake” (chunked in time) and “churro” (chunked in space) chunks of different sizes. The “small” versions are about 4 MB, and the “large” are about 100 MB.
chunk_schemes = {
"pancake_small": (1, -1, -1),
"pancake_large": (25, -1, -1),
"churro_small": (-1, 32, 32),
"churro_large": (-1, 160, 160),
}
# For larger grids, generating weights is quite expensive
xesmf_regridder = xesmf.Regridder(source, target, "conservative")
/home/slevang/miniconda3/envs/xarray-regrid/lib/python3.12/site-packages/xesmf/backend.py:56: UserWarning: Latitude is outside of [-90, 90]
warnings.warn('Latitude is outside of [-90, 90]')
/home/slevang/miniconda3/envs/xarray-regrid/lib/python3.12/site-packages/xesmf/backend.py:56: UserWarning: Latitude is outside of [-90, 90]
warnings.warn('Latitude is outside of [-90, 90]')
Timings
Run timings for different chunkings schemes and with NaN skipping enabled and disabled, across both libraries. Compare the ratio of xesmf / xarray-regrid to see the speedup factor of using this library.
import time
import pandas as pd
pd.options.display.precision = 1
def do_regrid(data, target, skipna):
data.regrid.conservative(target, skipna=skipna).compute()
def do_xesmf(data, target, skipna):
xesmf_regridder(data, skipna=skipna).compute()
def timing_grid(func, repeats=2):
times = pd.DataFrame(
index=chunk_schemes.keys(),
columns=["skipna=False", "skipna=True"],
)
for name, chunks in chunk_schemes.items():
data = source_data(source, chunks)
for skipna in [False, True]:
execution_times = []
for _ in range(repeats):
start = time.perf_counter()
func(data, target, skipna)
end = time.perf_counter()
execution_times.append(end - start)
# Sometimes the first execution is a little slower
times.loc[name, f"skipna={skipna}"] = min(execution_times)
return times
regrid_times = timing_grid(do_regrid)
xesmf_times = timing_grid(do_xesmf)
ratio = xesmf_times / regrid_times
/home/slevang/miniconda3/envs/xarray-regrid/lib/python3.12/site-packages/xarray/core/computation.py:320: PerformanceWarning: Regridding is increasing the number of chunks by a factor of 72.0, you might want to specify sizes in `output_chunks` in the regridder call. Default behaviour is to preserve the chunk sizes from the input (32, 32).
result_var = func(*data_vars)
/home/slevang/miniconda3/envs/xarray-regrid/lib/python3.12/site-packages/xarray/core/computation.py:320: PerformanceWarning: Regridding is increasing the number of chunks by a factor of 72.0, you might want to specify sizes in `output_chunks` in the regridder call. Default behaviour is to preserve the chunk sizes from the input (32, 32).
result_var = func(*data_vars)
/home/slevang/miniconda3/envs/xarray-regrid/lib/python3.12/site-packages/xarray/core/computation.py:320: PerformanceWarning: Regridding is increasing the number of chunks by a factor of 72.0, you might want to specify sizes in `output_chunks` in the regridder call. Default behaviour is to preserve the chunk sizes from the input (32, 32).
result_var = func(*data_vars)
/home/slevang/miniconda3/envs/xarray-regrid/lib/python3.12/site-packages/xarray/core/computation.py:320: PerformanceWarning: Regridding is increasing the number of chunks by a factor of 72.0, you might want to specify sizes in `output_chunks` in the regridder call. Default behaviour is to preserve the chunk sizes from the input (32, 32).
result_var = func(*data_vars)
/home/slevang/miniconda3/envs/xarray-regrid/lib/python3.12/site-packages/xarray/core/computation.py:320: PerformanceWarning: Regridding is increasing the number of chunks by a factor of 6.0, you might want to specify sizes in `output_chunks` in the regridder call. Default behaviour is to preserve the chunk sizes from the input (160, 160).
result_var = func(*data_vars)
/home/slevang/miniconda3/envs/xarray-regrid/lib/python3.12/site-packages/xarray/core/computation.py:320: PerformanceWarning: Regridding is increasing the number of chunks by a factor of 6.0, you might want to specify sizes in `output_chunks` in the regridder call. Default behaviour is to preserve the chunk sizes from the input (160, 160).
result_var = func(*data_vars)
/home/slevang/miniconda3/envs/xarray-regrid/lib/python3.12/site-packages/xarray/core/computation.py:320: PerformanceWarning: Regridding is increasing the number of chunks by a factor of 6.0, you might want to specify sizes in `output_chunks` in the regridder call. Default behaviour is to preserve the chunk sizes from the input (160, 160).
result_var = func(*data_vars)
/home/slevang/miniconda3/envs/xarray-regrid/lib/python3.12/site-packages/xarray/core/computation.py:320: PerformanceWarning: Regridding is increasing the number of chunks by a factor of 6.0, you might want to specify sizes in `output_chunks` in the regridder call. Default behaviour is to preserve the chunk sizes from the input (160, 160).
result_var = func(*data_vars)
Results
With current implementations, xesmf is slightly faster for large pancake-style chunks. xarray-regrid is much faster for small chunks, especially churro-style.
These tests were run on an 8-core Intel i7 Ubuntu desktop:
ratio
| skipna=False | skipna=True | |
|---|---|---|
| pancake_small | 3.7 | 7.2 |
| pancake_large | 0.6 | 1.1 |
| churro_small | 14.2 | 16.9 |
| churro_large | 1.8 | 2.4 |