Skip to content

resample: preserve scalar non-dim coords and refresh stale nodata attr (#2542)#2549

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-metadata-resample-2026-05-27
May 28, 2026
Merged

resample: preserve scalar non-dim coords and refresh stale nodata attr (#2542)#2549
brendancol merged 2 commits into
mainfrom
deep-sweep-metadata-resample-2026-05-27

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • The 2D non-identity path in resample() rebuilt its output xr.DataArray with only the spatial dim-coords, silently dropping any scalar non-dim coord on the input (e.g. rioxarray's spatial_ref, or a squeezed time / band selector). The identity path (scale_factor=1.0) and the 3D per-band path both kept these coords, so the 2D path was the inconsistent outlier (Cat 5).
  • _resolve_nodata() reads attrs['nodata'] as a sentinel fallback, but the output post-processing only refreshed _FillValue and nodatavals. Inputs declaring attrs['nodata'] = -9999 came back with that same finite value alongside data that was now NaN (Cat 4).
  • Fixed both gaps in one place at the final xr.DataArray constructor. Spatially-shaped non-dim coords intentionally do not carry across; their length is tied to the input resolution.

Backend coverage

numpy / cupy / dask+numpy / dask+cupy all route through the same constructor. Verified end-to-end for the spatial_ref carry across all four backends. Nodata-attr refresh verified on numpy / cupy / dask+numpy; dask+cupy with a non-NaN sentinel hits a pre-existing xr.where + cupy.astype quirk unrelated to this audit.

Test plan

  • 175 tests pass in xrspatial/tests/test_resample.py (168 existing + 7 new)
  • New TestMetadataPropagation cases: nodata-attr refresh, spatial_ref carry, scalar band/time carry, identity-vs-downsample coord parity, intentional drop of spatially-shaped extras
  • 4-backend spatial_ref parity confirmed manually
  • Discovered via deep-sweep metadata audit (sweep-metadata, 2026-05-27)

Closes #2542

#2542)

The 2D non-identity path in `resample()` rebuilt the output `xr.DataArray`
with only the spatial dim-coords, silently dropping any scalar non-dim
coord on the input. rioxarray stores CRS metadata on a `spatial_ref`
scalar coord, so chained `out.rio.crs` calls lost their CRS even though
`attrs['crs']` round-tripped fine. The identity path (`scale_factor=1.0`,
`agg.copy()`) and the 3D path (per-band `xr.concat`) both happened to
preserve the coord, leaving the 2D path as the inconsistent outlier
(Cat 5 backend / path-inconsistent metadata).

Separately, `_resolve_nodata()` reads `attrs['nodata']` as a sentinel
fallback after `_FillValue`, but the output post-processing only
refreshed `_FillValue` and `nodatavals`. Inputs declaring
`attrs['nodata'] = -9999` came back with that same finite value
alongside data that was now NaN, mismatching any downstream consumer
that trusts the attr (Cat 4 dtype/nodata semantics).

Fixes both gaps in one place at the final `xr.DataArray` constructor:
refresh `attrs['nodata']` to NaN when present, and fold any zero-dim
non-dim coord into the output coords dict. Spatially-shaped non-dim
coords (whose length is tied to the input resolution) intentionally
do not carry; the user would need to resample them separately.

Backend coverage: numpy / cupy / dask+numpy / dask+cupy all route
through the same constructor and were verified end-to-end for the
spatial_ref carry. Nodata-attr refresh was verified on numpy / cupy /
dask+numpy; dask+cupy with a non-NaN nodata sentinel hits a pre-existing
xr.where + cupy.astype quirk unrelated to this audit.

7 new tests in TestMetadataPropagation cover nodata-attr refresh,
spatial_ref / scalar selector coord carry, identity-vs-downsample
coord parity, and the intentional drop of spatially-shaped extras.

Closes #2542
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 28, 2026
@brendancol
Copy link
Copy Markdown
Contributor Author

PR Review: preserve scalar non-dim coords and refresh stale nodata attr

Read source (resample.py:1300-1424) and tests in full from the worktree at the PR head commit. The fix lives at the single 2D output constructor, so all 4 backends inherit it without per-backend changes.

Blockers

None.

Suggestions

  • xrspatial/tests/test_resample.py:203-308 — the PR body claims 4-backend parity for spatial_ref carry was verified manually, but no automated test pins this. Since the fix lives at the shared xr.DataArray constructor, a single @cuda_and_cupy_available test asserting spatial_ref survives a cupy resample would catch a backend-divergent regression. The companion PR (resample: close test-coverage gaps surfaced by deep-sweep audit #2548) already has the cross-backend scaffolding to copy from.
  • xrspatial/resample.py:1404-1412 — the 3D path (lines 1290-1322) hands each band to the 2D resample() call, so this fix transitively propagates spatial_ref through xr.concat. Worth one explicit test confirming the 3D spatial_ref case so the contract is pinned, not just inferred.

Nits

  • xrspatial/resample.py:1418coords={ydim: new_y, xdim: new_x, **extra_coords} relies on the coord_name in (ydim, xdim) skip at line 1409 to keep extra_coords from shadowing the rebuilt spatial coords. The current filter prevents collision; the safety is one-sided though (if the filter ever loosens, **extra_coords would silently override).
  • xrspatial/tests/test_resample.py:228,247int(out.coords['spatial_ref'].values) == 0 and int(out.coords['band'].values) == 1 cast the scalar coord to int. Works for the values used; == 0 / == 1 against the np scalar is one less coercion to read.

What looks good

  • Fix is at the single shared 2D output constructor, so all four backends pick it up without per-backend code.
  • Comments on lines 1387-1393 and 1396-1402 explain the why: which input attrs feed _resolve_nodata, why the identity path was the consistent outlier, why spatially-shaped coords are intentionally dropped.
  • Test names mirror the audit categories (test_nodata_attr_refreshed_to_nan, test_identity_and_downsample_coords_consistent), so a failure maps straight back to the issue.
  • test_nodata_attr_refreshed_to_nan pins both the attr refresh AND that the corresponding data pixel is NaN — catches the exact "metadata says NaN but data still has -9999" failure mode that motivated the fix.
  • test_2d_spatial_extra_coord_not_carried explicitly documents the chosen scope: scalar coords carry, spatially-shaped coords don't.

Checklist

- Add test_spatial_ref_preserved_3d: pin the 3D per-band -> xr.concat
  path, which inherits the carry-across transitively via the per-band
  2D resample() call.
- Add test_spatial_ref_preserved_cupy / _dask: pin the 4-backend
  parity claim from the PR body so a backend-divergent regression
  would surface (the fix lives at the shared 2D xr.DataArray
  constructor, but the contract is now explicitly tested).
- Drop redundant int() casts on scalar coord assertions; np scalar
  equality is one less coercion to read.

178 passed (was 175 + 3 new).
@brendancol brendancol merged commit 786b96e into main May 28, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

resample: drops non-dim coords (spatial_ref) and leaves stale attrs['nodata'] on 2D path

1 participant