Skip to content

reproject: reject explicit out-of-range nodata for integer dtypes#2591

Open
brendancol wants to merge 3 commits into
mainfrom
issue-2572
Open

reproject: reject explicit out-of-range nodata for integer dtypes#2591
brendancol wants to merge 3 commits into
mainfrom
issue-2572

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • reproject(..., nodata=X) used to let any finite X through, even when it could not fit the source/output integer dtype. The worker then cast-back wrapped the sentinel (e.g. -9999 into uint8 became 0), so out-of-bounds pixels were the same as valid zero pixels while attrs["nodata"] still advertised -9999.
  • Explicit out-of-range values now raise ValueError with a message naming the dtype and the offending value. Matches the precedent in xrspatial.rasterize for fill=NaN + integer dtype.
  • Attrs/rioxarray-derived out-of-range values (legacy files like uint16 + nodata=-9999) emit a UserWarning and fall back to the dtype-appropriate sentinel (signed dtype.min, unsigned dtype.max). This keeps existing reads working.

Closes #2572.

Backend coverage

The fix is in _detect_nodata, which runs once at the API boundary on the host and is backend-agnostic. The end-to-end test covers the numpy path. Cupy/dask paths go through the same _detect_nodata call, so the validation gate fires before any backend dispatch.

Test plan

  • New TestDetectNodataDtypeRange covers the five cases the issue asked for: uint8 + -9999 raises, uint16 + 70000 raises, uint8 + 255 boundary passes, int16 + -32768 boundary passes, attrs-derived out-of-range warns and falls back.
  • New TestReprojectIntegerNodataDtypeRange runs the same matrix end-to-end through reproject() and confirms the happy path actually writes the sentinel into the edge pixels (not 0).
  • Full test_reproject.py + test_reproject_coverage_2026_05_27.py suite still passes (373 tests).

)

Previously, _detect_nodata let any finite explicit nodata pass through
unchanged. When the value did not fit the source/output integer dtype,
the worker's cast-back step silently wrapped the sentinel (e.g. -9999
into uint8 became 0), so out-of-range output pixels were the same as
valid zero pixels while attrs['nodata'] still advertised -9999.

Now:
- Explicit out-of-range nodata raises ValueError naming the dtype and
  the offending value, matching the precedent in xrspatial.rasterize
  for fill=NaN + integer dtype.
- attrs/rioxarray-derived out-of-range nodata (legacy files such as
  uint16 + nodata=-9999) emits a UserWarning and falls back to the
  dtype-appropriate sentinel (signed -> dtype.min, unsigned -> dtype.max),
  so existing reads still load.

Tests cover the documented matrix: uint8+(-9999), uint16+70000,
uint8+255 boundary, int16+(-32768) boundary, and the attrs-derived
warn-and-fall-back path. End-to-end coverage through reproject()
also confirms the happy path now actually writes the sentinel into
edge pixels (not 0).
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 28, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: reproject: reject explicit out-of-range nodata for integer dtypes

Blockers (must fix before merge)

None. The fix is correct: the range check is properly placed after the NaN swap, the error message preserves the user's original value (not the floatified one), and the validation runs at the API boundary so all four backends share the gate.

Suggestions (should fix, not blocking)

  • xrspatial/reproject/_crs_utils.py:182 -- stacklevel=2 on the warnings.warn call points at the _detect_nodata call site itself, not at the user's reproject() call. The chain is user -> reproject() -> _detect_nodata() -> warnings.warn, so stacklevel=3 would surface the warning on the user's reproject(...) line, which is the actionable location. The whole point of the warning is to tell the user their attrs are inconsistent with the dtype, so pointing at the user code matters here.

  • xrspatial/reproject/__init__.py:651 (not modified by this PR) -- the public reproject() docstring still says only "Nodata value. Auto-detected if None." with no mention of the new dtype-range validation. Adding a sentence ("Raises ValueError if the value cannot be represented in the source dtype.") would give users discoverable docs for the new contract. Optional, but matches what other functions in the project do for similar guards.

Nits (optional improvements)

  • xrspatial/tests/test_reproject.py:200 -- the test_reproject_uint8_in_range_nodata_writes_correct_pixels test asserts int(out.values[0, 0]) == 255 to check the edge pixel. That works, but a tighter assertion would be to count the nodata pixels and confirm at least the border ring is filled. Not worth a rev on its own.

What looks good

  • The strict-vs-permissive split (explicit raises, attrs warns) matches the precedent set in xrspatial.geotiff._decode._resolve_masked_fill for legacy files with out-of-range sentinels and xrspatial.rasterize's ValueError for fill=NaN + integer dtype.
  • Error message is specific and actionable: it names the dtype, the offending value, the valid range, and the concrete failure mode (-9999 into uint8 becomes 0).
  • The existing test test_detect_nodata_explicit (no dtype arg) still passes, confirming the new code only activates when dtype is provided.
  • CHANGELOG entry is in the right section and references the issue.

Checklist

  • Algorithm matches reference/paper -- this is a validation gate, no algorithm to match
  • All implemented backends produce consistent results -- gate runs at API boundary upstream of dispatch
  • NaN handling is correct -- NaN swap precedes the range check, so NaN never reaches the new branch
  • Edge cases are covered by tests -- boundary cases (dtype.max for uint8, dtype.min for int16) included
  • Dask chunk boundaries handled correctly -- gate runs on a scalar, no chunk concern
  • No premature materialization or unnecessary copies
  • Benchmark exists or is not needed -- input validation, no benchmark needed
  • README feature matrix updated (if applicable) -- N/A for a bug fix
  • Docstrings present and accurate -- _detect_nodata docstring updated; public reproject() docstring not updated (see Suggestion 2)

- Bump stacklevel to 3 on the out-of-range attrs warning so the
  warning points at the user's reproject() call rather than the
  internal _detect_nodata frame.
- Document the new dtype-range validation in the public reproject()
  docstring under the nodata parameter so the new contract is
  discoverable.
- Tighten the uint8 in-range happy-path test to also assert that the
  full top row of the output is the sentinel, not a mix of valid and
  wrapped values.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after 53f33ee

All three previous-review items applied:

  • Suggestion 1 (stacklevel=3): fixed in _crs_utils.py:184. The warning now surfaces at the user's reproject() call.
  • Suggestion 2 (docstring): fixed in __init__.py:683-688. The nodata parameter doc now mentions the new validation behavior with both the raise and the fallback paths.
  • Nit 1 (tighter test assertion): fixed in test_reproject.py:205-211. The happy-path test now asserts the full top row is the sentinel, not just one pixel.

Tests still pass: 9/9 new + 354/354 pre-existing.

No new findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject: out-of-range explicit nodata corrupts integer output and contradicts attrs

1 participant