reproject: reject explicit out-of-range nodata for integer dtypes#2591
reproject: reject explicit out-of-range nodata for integer dtypes#2591brendancol wants to merge 3 commits into
Conversation
) Previously, _detect_nodata let any finite explicit nodata pass through unchanged. When the value did not fit the source/output integer dtype, the worker's cast-back step silently wrapped the sentinel (e.g. -9999 into uint8 became 0), so out-of-range output pixels were the same as valid zero pixels while attrs['nodata'] still advertised -9999. Now: - Explicit out-of-range nodata raises ValueError naming the dtype and the offending value, matching the precedent in xrspatial.rasterize for fill=NaN + integer dtype. - attrs/rioxarray-derived out-of-range nodata (legacy files such as uint16 + nodata=-9999) emits a UserWarning and falls back to the dtype-appropriate sentinel (signed -> dtype.min, unsigned -> dtype.max), so existing reads still load. Tests cover the documented matrix: uint8+(-9999), uint16+70000, uint8+255 boundary, int16+(-32768) boundary, and the attrs-derived warn-and-fall-back path. End-to-end coverage through reproject() also confirms the happy path now actually writes the sentinel into edge pixels (not 0).
brendancol
left a comment
There was a problem hiding this comment.
PR Review: reproject: reject explicit out-of-range nodata for integer dtypes
Blockers (must fix before merge)
None. The fix is correct: the range check is properly placed after the NaN swap, the error message preserves the user's original value (not the floatified one), and the validation runs at the API boundary so all four backends share the gate.
Suggestions (should fix, not blocking)
-
xrspatial/reproject/_crs_utils.py:182--stacklevel=2on thewarnings.warncall points at the_detect_nodatacall site itself, not at the user'sreproject()call. The chain is user ->reproject()->_detect_nodata()->warnings.warn, sostacklevel=3would surface the warning on the user'sreproject(...)line, which is the actionable location. The whole point of the warning is to tell the user their attrs are inconsistent with the dtype, so pointing at the user code matters here. -
xrspatial/reproject/__init__.py:651(not modified by this PR) -- the publicreproject()docstring still says only "Nodata value. Auto-detected if None." with no mention of the new dtype-range validation. Adding a sentence ("RaisesValueErrorif the value cannot be represented in the source dtype.") would give users discoverable docs for the new contract. Optional, but matches what other functions in the project do for similar guards.
Nits (optional improvements)
xrspatial/tests/test_reproject.py:200-- thetest_reproject_uint8_in_range_nodata_writes_correct_pixelstest assertsint(out.values[0, 0]) == 255to check the edge pixel. That works, but a tighter assertion would be to count the nodata pixels and confirm at least the border ring is filled. Not worth a rev on its own.
What looks good
- The strict-vs-permissive split (explicit raises, attrs warns) matches the precedent set in
xrspatial.geotiff._decode._resolve_masked_fillfor legacy files with out-of-range sentinels andxrspatial.rasterize'sValueErrorforfill=NaN + integer dtype. - Error message is specific and actionable: it names the dtype, the offending value, the valid range, and the concrete failure mode (
-9999 into uint8 becomes 0). - The existing test
test_detect_nodata_explicit(no dtype arg) still passes, confirming the new code only activates when dtype is provided. - CHANGELOG entry is in the right section and references the issue.
Checklist
- Algorithm matches reference/paper -- this is a validation gate, no algorithm to match
- All implemented backends produce consistent results -- gate runs at API boundary upstream of dispatch
- NaN handling is correct -- NaN swap precedes the range check, so NaN never reaches the new branch
- Edge cases are covered by tests -- boundary cases (dtype.max for uint8, dtype.min for int16) included
- Dask chunk boundaries handled correctly -- gate runs on a scalar, no chunk concern
- No premature materialization or unnecessary copies
- Benchmark exists or is not needed -- input validation, no benchmark needed
- README feature matrix updated (if applicable) -- N/A for a bug fix
- Docstrings present and accurate --
_detect_nodatadocstring updated; publicreproject()docstring not updated (see Suggestion 2)
- Bump stacklevel to 3 on the out-of-range attrs warning so the warning points at the user's reproject() call rather than the internal _detect_nodata frame. - Document the new dtype-range validation in the public reproject() docstring under the nodata parameter so the new contract is discoverable. - Tighten the uint8 in-range happy-path test to also assert that the full top row of the output is the sentinel, not a mix of valid and wrapped values.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review after 53f33ee
All three previous-review items applied:
- Suggestion 1 (stacklevel=3): fixed in
_crs_utils.py:184. The warning now surfaces at the user'sreproject()call. - Suggestion 2 (docstring): fixed in
__init__.py:683-688. Thenodataparameter doc now mentions the new validation behavior with both the raise and the fallback paths. - Nit 1 (tighter test assertion): fixed in
test_reproject.py:205-211. The happy-path test now asserts the full top row is the sentinel, not just one pixel.
Tests still pass: 9/9 new + 354/354 pre-existing.
No new findings.
# Conflicts: # CHANGELOG.md
Summary
reproject(..., nodata=X)used to let any finiteXthrough, even when it could not fit the source/output integer dtype. The worker then cast-back wrapped the sentinel (e.g.-9999intouint8became0), so out-of-bounds pixels were the same as valid zero pixels whileattrs["nodata"]still advertised-9999.ValueErrorwith a message naming the dtype and the offending value. Matches the precedent inxrspatial.rasterizeforfill=NaN + integer dtype.uint16 + nodata=-9999) emit aUserWarningand fall back to the dtype-appropriate sentinel (signeddtype.min, unsigneddtype.max). This keeps existing reads working.Closes #2572.
Backend coverage
The fix is in
_detect_nodata, which runs once at the API boundary on the host and is backend-agnostic. The end-to-end test covers the numpy path. Cupy/dask paths go through the same_detect_nodatacall, so the validation gate fires before any backend dispatch.Test plan
TestDetectNodataDtypeRangecovers the five cases the issue asked for: uint8 + -9999 raises, uint16 + 70000 raises, uint8 + 255 boundary passes, int16 + -32768 boundary passes, attrs-derived out-of-range warns and falls back.TestReprojectIntegerNodataDtypeRangeruns the same matrix end-to-end throughreproject()and confirms the happy path actually writes the sentinel into the edge pixels (not 0).test_reproject.py+test_reproject_coverage_2026_05_27.pysuite still passes (373 tests).