Validate return_type at entry to stats() (#2558)#2578
Merged
Conversation
`stats(return_type=...)` previously fell through to a numpy-buffer
branch for any value other than 'pandas.DataFrame'. So a typo like
return_type='bogus' silently returned a raw numpy.ndarray instead
of raising or returning a documented type.
Validate up front against ('pandas.DataFrame', 'xarray.DataArray')
and raise ValueError listing the allowed values otherwise. Also
reject return_type='xarray.DataArray' on dask-backed input (which
the docstring already said was unsupported but the code silently
mishandled). Expand the docstring to enumerate what each value
returns. Add tests for the default, both valid values, an invalid
value, the error message contents, the dask-rejection case, and
the Dataset-input case.
brendancol
commented
May 28, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Validate return_type at entry to stats() (#2558)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
xrspatial/zonal.pyaround the new dask guard:return_type='xarray.DataArray'is also unsupported for the cupy backend._stats_cupyreturns a DataFrame and the downstreamxr.DataArray(result, coords=..., dims=...)wrapper would crash or produce garbage. The current code only guards the dask case. Consider adding a parallel guard for cupy, or restructure as "only the pure-numpy path supports xarray.DataArray". This is not a regression introduced by this PR (the bug existed before), but the PR is already in the right area.
Nits (optional improvements)
xrspatial/zonal.pyline ~833: the comment ends with "produce a numpy array instead of one of the documented types". Minor word choice, but "raw numpy buffer" or "internal stats buffer" might be clearer about why this is bad (it is not just the wrong type, it is an undocumented intermediate).xrspatial/tests/test_zonal.py_small_zones_values_2558: the helper is private to this test group and only used by tests in this file. Consider a@pytest.fixturefor symmetry with the other fixtures in this module. Pure style.
What looks good
- Early validation against an explicit allowed-set is the right shape for this fix.
- Error message includes the bad value (
return_type={return_type!r}) and the allowed list, so callers can diagnose typos quickly. - Test matrix covers the default, both valid values, a parameterized set of invalid strings, the error message contents, the dask path, and the Dataset path.
- Docstring rewrite spells out what each value returns and notes which backends each is valid for.
- CHANGELOG entry under Unreleased / Fixed is in the right section and references the issue.
Checklist
- Algorithm matches reference/paper: N/A, validation-only change
- All implemented backends produce consistent results: the validator runs before dispatch
- NaN handling is correct: not touched
- Edge cases are covered by tests: empty string, wrong case, near-match
- Dask chunk boundaries handled correctly: not touched
- No premature materialization or unnecessary copies
- Benchmark exists or is not needed: not needed for a validation fix
- README feature matrix updated: N/A, no new function
- Docstrings present and accurate
Review feedback on PR #2578: - Suggestion: the cupy backend also cannot satisfy return_type='xarray.DataArray' (_stats_cupy returns a DataFrame so wrapping it in xr.DataArray downstream would crash). Generalise the guard to reject both cupy and dask backends with a clear message. - Nit: clarify the validator's comment to say the old behaviour leaked an undocumented intermediate buffer, not just the wrong type. - Nit: convert _small_zones_values_2558 helper into a pytest fixture for consistency with the rest of the test module. Add a cupy rejection test that runs when CUDA + cupy are available.
brendancol
commented
May 28, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review
All findings from the previous review pass have been addressed.
Disposition
- Suggestion (cupy guard): fixed in 0a068d6. The non-numpy guard now covers both dask and cupy backends and reports which one it tripped on.
- Nit (validator comment wording): fixed. Rephrased to "leak that undocumented intermediate to the caller" so it is clear the old behaviour exposed an internal buffer, not just a wrong type.
- Nit (helper to fixture): fixed.
_small_zones_values_2558is now a@pytest.fixturenamedsmall_zones_values_2558. A cupy rejection test was added in the same section, gated onhas_cuda_and_cupy().
Verification
- All 12 issue-2558 tests pass locally (cupy test also runs and passes on this machine).
- Full
xrspatial/tests/test_zonal.pysuite: 148 passed.
No remaining findings from my side.
# Conflicts: # CHANGELOG.md
# Conflicts: # CHANGELOG.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2558.
Summary
stats(return_type=...)now validates against('pandas.DataFrame', 'xarray.DataArray')at entry and raisesValueErrorlisting the allowed values for anything else. Before this change, an unrecognised string fell through to the numpy buffer branch and silently returned a rawnumpy.ndarray.return_type='xarray.DataArray'on dask-backed input now raises a clearValueError(the docstring already said it was unsupported, but the code path silently produced the wrong shape).Backend coverage
The validator runs before backend dispatch, so the new error is consistent across numpy, cupy, dask+numpy, and dask+cupy. The dask-specific guard rejects the
'xarray.DataArray'return type on any dask-backed input (numpy or cupy chunks).Test plan
return_typestill returnspandas.DataFrame.'pandas.DataFrame'returnspandas.DataFrame.'xarray.DataArray'returnsxarray.DataArraywith the right dims and shape.'bogus','numpy.ndarray','','DataFrame','xarray.dataarray') raiseValueError.'xarray.DataArray'.return_typeraisesValueError.