Additional area statistics

Aside from the separate “most_common” regridder, a more generic statistical reductions are also available.

A demo of this is shown below, based on the Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST) dataset.

For optimal memory management we want to make use of Dask’s distributed client:

from dask import distributed

c = distributed.Client()
c

Client

Client-b62f5bfe-7b1f-11ef-9929-2c6dc1920356

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

The original dataset is of a very high resolution. We will focus on a smaller slice of the globe, and display the original data for reference:

import xarray as xr
import xarray_regrid

sst = xr.open_zarr("https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1")["analysed_sst"]

# Reduce size of array by only selecting a slice
sst = sst.sel(lat=slice(30, 45), lon=slice(125, 150)).isel(time=0)

sst.plot()
<matplotlib.collections.QuadMesh at 0x7fdc44319b80>
../../_images/d5742076eec90cde6a9cb31abaa2a3aa58ad82ed5928f1eaa75febf4f98e2b79.png

To regrid we define a new target grid, with a lower resolution.

target = xarray_regrid.Grid(
    north=45,
    south=30,
    west=125,
    east=150,
    resolution_lat=1,
    resolution_lon=1,
).create_regridding_dataset(lat_name="lat", lon_name="lon")

We will take the variance of the data. Note that this operation is lazy when the data consists of dask arrays.

sst_var = sst.regrid.stat(target, method="var", time_dim="time", skipna=False)

When we plot the DataArray, the data is retrieved and the result computed.

Other methods are available, such as “sum”, “mean”, “std”, “median”, “min”, and “max”.

sst_var.plot()
/home/bart/micromamba/envs/xarray_regrid_3.12/lib/python3.12/site-packages/distributed/client.py:3358: UserWarning: Sending large graph of size 28.65 MiB.
This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.dask.org/en/stable/best-practices.html#load-data-with-dask for more information.
  warnings.warn(
<matplotlib.collections.QuadMesh at 0x7fdc32f05850>
../../_images/a9e239c0d17d34dd9299f73a778559fe33c726adc02adc9ab77cc16c8ce3b7e6.png