resample: preserve integer precision in nodata mask comparison (#2570) by brendancol · Pull Request #2592 · xarray-contrib/xarray-spatial

brendancol · 2026-05-28T15:28:44Z

Closes #2570.

Summary

_resolve_nodata no longer routes integer sentinels through float(); it casts to the input dtype instead so the equality test in _apply_nodata_mask runs in integer space.
For int64 sentinels above 2**53 (e.g. nodata = 2**60 + 1), valid cells that share the rounded float representation (e.g. 2**60) are no longer masked out.
Float inputs still route through float() so the existing NaN short-circuit keeps working.

Backend coverage

The fix lives in the shared masking helper that every backend goes through (numpy / cupy / dask+numpy / dask+cupy). Verified on numpy and dask+numpy locally.

Test plan

New TestNodataIntegerPrecision class in xrspatial/tests/test_resample.py covering:
- int64 sentinel above 2**53 via explicit kwarg
- int64 sentinel via _FillValue attr
- int32 sentinel (negative min)
- uint16 sentinel (max)
- float NaN sentinel short-circuit still works
- float finite sentinel still works
Pre-existing TestNodata cases still pass
Full test_resample.py suite (184 tests) green
test_resample_coverage_2026_05_27.py and test_resample_signature_annot_2544.py green

`_resolve_nodata` cast the sentinel through `float()` and `_apply_nodata_mask` compared the data array against that float. For int64 values above 2**53 the cast is lossy, so distinct integers collided after rounding and valid cells were masked to NaN (e.g. nodata=2**60+1 also masked a valid 2**60 cell). Fix: keep the sentinel in the input's native dtype when the input is integer/bool, and compare in integer space before promoting the output to float32. Float inputs still route through float() so NaN semantics keep working. The masking helper feeds every backend (numpy / cupy / dask+numpy / dask+cupy), so the change lives in one place.

brendancol

PR Review: resample integer nodata precision fix

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

xrspatial/resample.py:1162-1165 -- the try/except TypeError around the NaN check is dead code. isinstance(nodata, float) already gates the path, and np.isnan on a Python float will never raise TypeError. Drop the try wrapper and keep the if isinstance(...) and np.isnan(...) check on its own line.

Nits (optional improvements)

xrspatial/resample.py:1166 -- np.asarray(nodata).astype(agg.dtype).item() will silently truncate a non-integer float sentinel (e.g. nodata=-9999.5 on int32 data becomes -9999). Realistically nobody passes a fractional sentinel, but a one-line check like if isinstance(nodata, float) and not nodata.is_integer(): raise ValueError(...) would catch the obvious user error. Skip if you'd rather keep the helper minimal.
xrspatial/tests/test_resample.py:1456 -- test_int64_sentinel_above_2pow53_preserves_distinct_values is the headline regression test. Consider asserting out.values[1, 1] == np.float32(1 << 60) so a future regression that masks the cell to NaN AND a regression that swaps the surviving value both get caught. The current np.isfinite check is correct but a bit loose.

What looks good

The split between float-input and integer-input paths in _resolve_nodata is the right call. It keeps the float NaN short-circuit working while routing integer comparisons through the input dtype.
_apply_nodata_mask computes the mask before promoting to float32, which is exactly the surgical change the bug needs. The mask carries over correctly to agg.where(mask) after the cast.
Backend coverage is right: the masking helper sits upstream of every per-backend dispatch path, so numpy, cupy, dask+numpy, and dask+cupy all inherit the fix automatically. The PR body confirms numpy and dask+numpy were exercised locally.
Test coverage hits the exact failure mode (int64 sentinel above 2**53), the explicit-kwarg and attr paths, and the unaffected float branches.

Checklist

Algorithm matches reference / intent (integer equality in input dtype)
All implemented backends produce consistent results (single shared helper)
NaN handling is correct (float NaN short-circuit preserved; NaN sentinel on int data is a no-op)
Edge cases are covered by tests (int64 > 2**53, int32, uint16, float NaN, float finite)
No dask chunk boundary concerns (no neighborhood op in this change)
No premature materialization or unnecessary copies
Benchmark: not applicable for a correctness fix in a helper
README feature matrix: not applicable, no new function or backend
Docstrings updated to reflect the new dispatch on input dtype

…en test (#2570) - Remove the `try/except TypeError` wrapper around the NaN check; the `isinstance(nodata, float)` gate makes the except branch unreachable. - Raise ValueError when a fractional float sentinel (e.g. -9999.5) is passed for an integer input -- silent truncation to int would mask cells the caller never asked to mask. - Tighten the int64-above-2**53 regression test to assert the surviving cell holds the right value, not just that it's finite.

brendancol

Follow-up review

Re-ran the review after the address-nits commit (1adea6f). All previous findings are resolved:

Suggestion (dead try/except): FIXED -- the wrapper is gone; the isinstance + np.isnan check stands alone.
Nit (silent fractional truncation): FIXED -- _resolve_nodata now raises ValueError("nodata=... is not representable in integer dtype ...") when a non-integer float sentinel is passed for an integer input. Added test_fractional_float_sentinel_on_int_input_raises to lock it in.
Nit (loose isfinite assertion): FIXED -- the int64-above-2**53 regression test now asserts the surviving cell equals np.float32(1 << 60), so masking-to-NaN and value-swap regressions both get caught.

Full test_resample.py suite passes locally (185 tests). No remaining blockers, suggestions, or nits.

github-actions Bot added the performance PR touches performance-sensitive code label May 28, 2026

brendancol commented May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resample: preserve integer precision in nodata mask comparison (#2570)#2592

resample: preserve integer precision in nodata mask comparison (#2570)#2592
brendancol wants to merge 2 commits into
mainfrom
issue-2570

brendancol commented May 28, 2026

Uh oh!

brendancol left a comment

Uh oh!

brendancol left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented May 28, 2026

Summary

Backend coverage

Test plan

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

PR Review: resample integer nodata precision fix

Blockers (must fix before merge)

Suggestions (should fix, not blocking)

Nits (optional improvements)

What looks good

Checklist

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

Follow-up review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant