Skip to content

Validate return_type at entry to stats() (#2558)#2578

Merged
brendancol merged 4 commits into
mainfrom
issue-2558
May 28, 2026
Merged

Validate return_type at entry to stats() (#2558)#2578
brendancol merged 4 commits into
mainfrom
issue-2558

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2558.

Summary

  • stats(return_type=...) now validates against ('pandas.DataFrame', 'xarray.DataArray') at entry and raises ValueError listing the allowed values for anything else. Before this change, an unrecognised string fell through to the numpy buffer branch and silently returned a raw numpy.ndarray.
  • return_type='xarray.DataArray' on dask-backed input now raises a clear ValueError (the docstring already said it was unsupported, but the code path silently produced the wrong shape).
  • Docstring expanded to enumerate what each allowed value returns.

Backend coverage

The validator runs before backend dispatch, so the new error is consistent across numpy, cupy, dask+numpy, and dask+cupy. The dask-specific guard rejects the 'xarray.DataArray' return type on any dask-backed input (numpy or cupy chunks).

Test plan

  • Default return_type still returns pandas.DataFrame.
  • Explicit 'pandas.DataFrame' returns pandas.DataFrame.
  • Explicit 'xarray.DataArray' returns xarray.DataArray with the right dims and shape.
  • Invalid values ('bogus', 'numpy.ndarray', '', 'DataFrame', 'xarray.dataarray') raise ValueError.
  • The error message names both allowed values.
  • Dask input rejects 'xarray.DataArray'.
  • Dataset input with an invalid return_type raises ValueError.
  • Full existing zonal test suite still passes (147 tests).

`stats(return_type=...)` previously fell through to a numpy-buffer
branch for any value other than 'pandas.DataFrame'. So a typo like
return_type='bogus' silently returned a raw numpy.ndarray instead
of raising or returning a documented type.

Validate up front against ('pandas.DataFrame', 'xarray.DataArray')
and raise ValueError listing the allowed values otherwise. Also
reject return_type='xarray.DataArray' on dask-backed input (which
the docstring already said was unsupported but the code silently
mishandled). Expand the docstring to enumerate what each value
returns. Add tests for the default, both valid values, an invalid
value, the error message contents, the dask-rejection case, and
the Dataset-input case.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 28, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Validate return_type at entry to stats() (#2558)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/zonal.py around the new dask guard: return_type='xarray.DataArray' is also unsupported for the cupy backend. _stats_cupy returns a DataFrame and the downstream xr.DataArray(result, coords=..., dims=...) wrapper would crash or produce garbage. The current code only guards the dask case. Consider adding a parallel guard for cupy, or restructure as "only the pure-numpy path supports xarray.DataArray". This is not a regression introduced by this PR (the bug existed before), but the PR is already in the right area.

Nits (optional improvements)

  • xrspatial/zonal.py line ~833: the comment ends with "produce a numpy array instead of one of the documented types". Minor word choice, but "raw numpy buffer" or "internal stats buffer" might be clearer about why this is bad (it is not just the wrong type, it is an undocumented intermediate).
  • xrspatial/tests/test_zonal.py _small_zones_values_2558: the helper is private to this test group and only used by tests in this file. Consider a @pytest.fixture for symmetry with the other fixtures in this module. Pure style.

What looks good

  • Early validation against an explicit allowed-set is the right shape for this fix.
  • Error message includes the bad value (return_type={return_type!r}) and the allowed list, so callers can diagnose typos quickly.
  • Test matrix covers the default, both valid values, a parameterized set of invalid strings, the error message contents, the dask path, and the Dataset path.
  • Docstring rewrite spells out what each value returns and notes which backends each is valid for.
  • CHANGELOG entry under Unreleased / Fixed is in the right section and references the issue.

Checklist

  • Algorithm matches reference/paper: N/A, validation-only change
  • All implemented backends produce consistent results: the validator runs before dispatch
  • NaN handling is correct: not touched
  • Edge cases are covered by tests: empty string, wrong case, near-match
  • Dask chunk boundaries handled correctly: not touched
  • No premature materialization or unnecessary copies
  • Benchmark exists or is not needed: not needed for a validation fix
  • README feature matrix updated: N/A, no new function
  • Docstrings present and accurate

Review feedback on PR #2578:

- Suggestion: the cupy backend also cannot satisfy
  return_type='xarray.DataArray' (_stats_cupy returns a DataFrame so
  wrapping it in xr.DataArray downstream would crash). Generalise the
  guard to reject both cupy and dask backends with a clear message.
- Nit: clarify the validator's comment to say the old behaviour
  leaked an undocumented intermediate buffer, not just the wrong type.
- Nit: convert _small_zones_values_2558 helper into a pytest fixture
  for consistency with the rest of the test module. Add a cupy
  rejection test that runs when CUDA + cupy are available.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review

All findings from the previous review pass have been addressed.

Disposition

  • Suggestion (cupy guard): fixed in 0a068d6. The non-numpy guard now covers both dask and cupy backends and reports which one it tripped on.
  • Nit (validator comment wording): fixed. Rephrased to "leak that undocumented intermediate to the caller" so it is clear the old behaviour exposed an internal buffer, not just a wrong type.
  • Nit (helper to fixture): fixed. _small_zones_values_2558 is now a @pytest.fixture named small_zones_values_2558. A cupy rejection test was added in the same section, gated on has_cuda_and_cupy().

Verification

  • All 12 issue-2558 tests pass locally (cupy test also runs and passes on this machine).
  • Full xrspatial/tests/test_zonal.py suite: 148 passed.

No remaining findings from my side.

@brendancol brendancol merged commit 4580bba into main May 28, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

stats(return_type=...) is not validated; bad values silently return a raw numpy array

1 participant