Additional area statistics

Aside from the separate “most_common” regridder, a more generic statistical reductions are also available.

A demo of this is shown below, based on the Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST) dataset.

For optimal memory management we want to make use of Dask’s distributed client:

from dask import distributed

c = distributed.Client()
c

Client

Client-b62f5bfe-7b1f-11ef-9929-2c6dc1920356

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

LocalCluster

726745b8

Dashboard: http://127.0.0.1:8787/status	Workers: 4
Total threads: 8	Total memory: 15.33 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-107d9031-e3cf-4099-a501-efd4988639b9

Comm: tcp://127.0.0.1:44181	Workers: 4
Dashboard: http://127.0.0.1:8787/status	Total threads: 8
Started: Just now	Total memory: 15.33 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:38217	Total threads: 2
Dashboard: http://127.0.0.1:46731/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:38577
Local directory: /tmp/dask-scratch-space/worker-4hf6p93c

Worker: 1

Comm: tcp://127.0.0.1:45173	Total threads: 2
Dashboard: http://127.0.0.1:43307/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:34411
Local directory: /tmp/dask-scratch-space/worker-jf1bw94t

Worker: 2

Comm: tcp://127.0.0.1:42501	Total threads: 2
Dashboard: http://127.0.0.1:42529/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:45997
Local directory: /tmp/dask-scratch-space/worker-i6afahfk

Worker: 3

Comm: tcp://127.0.0.1:46337	Total threads: 2
Dashboard: http://127.0.0.1:35121/status	Memory: 3.83 GiB
Nanny: tcp://127.0.0.1:43541
Local directory: /tmp/dask-scratch-space/worker-yqipft_l

The original dataset is of a very high resolution. We will focus on a smaller slice of the globe, and display the original data for reference:

import xarray as xr
import xarray_regrid

sst = xr.open_zarr("https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1")["analysed_sst"]

# Reduce size of array by only selecting a slice
sst = sst.sel(lat=slice(30, 45), lon=slice(125, 150)).isel(time=0)

sst.plot()

<matplotlib.collections.QuadMesh at 0x7fdc44319b80>

../../_images/d5742076eec90cde6a9cb31abaa2a3aa58ad82ed5928f1eaa75febf4f98e2b79.png

To regrid we define a new target grid, with a lower resolution.

target = xarray_regrid.Grid(
    north=45,
    south=30,
    west=125,
    east=150,
    resolution_lat=1,
    resolution_lon=1,
).create_regridding_dataset(lat_name="lat", lon_name="lon")

We will take the variance of the data. Note that this operation is lazy when the data consists of dask arrays.

sst_var = sst.regrid.stat(target, method="var", time_dim="time", skipna=False)

When we plot the DataArray, the data is retrieved and the result computed.

Other methods are available, such as “sum”, “mean”, “std”, “median”, “min”, and “max”.

sst_var.plot()

/home/bart/micromamba/envs/xarray_regrid_3.12/lib/python3.12/site-packages/distributed/client.py:3358: UserWarning: Sending large graph of size 28.65 MiB.
This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.dask.org/en/stable/best-practices.html#load-data-with-dask for more information.
  warnings.warn(

<matplotlib.collections.QuadMesh at 0x7fdc32f05850>

../../_images/a9e239c0d17d34dd9299f73a778559fe33c726adc02adc9ab77cc16c8ce3b7e6.png