Skip to content

resample: preserve integer precision in nodata mask comparison (#2570)#2592

Open
brendancol wants to merge 2 commits into
mainfrom
issue-2570
Open

resample: preserve integer precision in nodata mask comparison (#2570)#2592
brendancol wants to merge 2 commits into
mainfrom
issue-2570

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2570.

Summary

  • _resolve_nodata no longer routes integer sentinels through float(); it casts to the input dtype instead so the equality test in _apply_nodata_mask runs in integer space.
  • For int64 sentinels above 2**53 (e.g. nodata = 2**60 + 1), valid cells that share the rounded float representation (e.g. 2**60) are no longer masked out.
  • Float inputs still route through float() so the existing NaN short-circuit keeps working.

Backend coverage

The fix lives in the shared masking helper that every backend goes through (numpy / cupy / dask+numpy / dask+cupy). Verified on numpy and dask+numpy locally.

Test plan

  • New TestNodataIntegerPrecision class in xrspatial/tests/test_resample.py covering:
    • int64 sentinel above 2**53 via explicit kwarg
    • int64 sentinel via _FillValue attr
    • int32 sentinel (negative min)
    • uint16 sentinel (max)
    • float NaN sentinel short-circuit still works
    • float finite sentinel still works
  • Pre-existing TestNodata cases still pass
  • Full test_resample.py suite (184 tests) green
  • test_resample_coverage_2026_05_27.py and test_resample_signature_annot_2544.py green

`_resolve_nodata` cast the sentinel through `float()` and
`_apply_nodata_mask` compared the data array against that float. For
int64 values above 2**53 the cast is lossy, so distinct integers
collided after rounding and valid cells were masked to NaN (e.g.
nodata=2**60+1 also masked a valid 2**60 cell).

Fix: keep the sentinel in the input's native dtype when the input is
integer/bool, and compare in integer space before promoting the
output to float32. Float inputs still route through float() so NaN
semantics keep working.

The masking helper feeds every backend (numpy / cupy / dask+numpy /
dask+cupy), so the change lives in one place.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 28, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: resample integer nodata precision fix

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/resample.py:1162-1165 -- the try/except TypeError around the NaN check is dead code. isinstance(nodata, float) already gates the path, and np.isnan on a Python float will never raise TypeError. Drop the try wrapper and keep the if isinstance(...) and np.isnan(...) check on its own line.

Nits (optional improvements)

  • xrspatial/resample.py:1166 -- np.asarray(nodata).astype(agg.dtype).item() will silently truncate a non-integer float sentinel (e.g. nodata=-9999.5 on int32 data becomes -9999). Realistically nobody passes a fractional sentinel, but a one-line check like if isinstance(nodata, float) and not nodata.is_integer(): raise ValueError(...) would catch the obvious user error. Skip if you'd rather keep the helper minimal.
  • xrspatial/tests/test_resample.py:1456 -- test_int64_sentinel_above_2pow53_preserves_distinct_values is the headline regression test. Consider asserting out.values[1, 1] == np.float32(1 << 60) so a future regression that masks the cell to NaN AND a regression that swaps the surviving value both get caught. The current np.isfinite check is correct but a bit loose.

What looks good

  • The split between float-input and integer-input paths in _resolve_nodata is the right call. It keeps the float NaN short-circuit working while routing integer comparisons through the input dtype.
  • _apply_nodata_mask computes the mask before promoting to float32, which is exactly the surgical change the bug needs. The mask carries over correctly to agg.where(mask) after the cast.
  • Backend coverage is right: the masking helper sits upstream of every per-backend dispatch path, so numpy, cupy, dask+numpy, and dask+cupy all inherit the fix automatically. The PR body confirms numpy and dask+numpy were exercised locally.
  • Test coverage hits the exact failure mode (int64 sentinel above 2**53), the explicit-kwarg and attr paths, and the unaffected float branches.

Checklist

  • Algorithm matches reference / intent (integer equality in input dtype)
  • All implemented backends produce consistent results (single shared helper)
  • NaN handling is correct (float NaN short-circuit preserved; NaN sentinel on int data is a no-op)
  • Edge cases are covered by tests (int64 > 2**53, int32, uint16, float NaN, float finite)
  • No dask chunk boundary concerns (no neighborhood op in this change)
  • No premature materialization or unnecessary copies
  • Benchmark: not applicable for a correctness fix in a helper
  • README feature matrix: not applicable, no new function or backend
  • Docstrings updated to reflect the new dispatch on input dtype

…en test (#2570)

- Remove the `try/except TypeError` wrapper around the NaN check; the
  `isinstance(nodata, float)` gate makes the except branch unreachable.
- Raise ValueError when a fractional float sentinel (e.g. -9999.5) is
  passed for an integer input -- silent truncation to int would mask
  cells the caller never asked to mask.
- Tighten the int64-above-2**53 regression test to assert the surviving
  cell holds the right value, not just that it's finite.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review

Re-ran the review after the address-nits commit (1adea6f). All previous findings are resolved:

  • Suggestion (dead try/except): FIXED -- the wrapper is gone; the isinstance + np.isnan check stands alone.
  • Nit (silent fractional truncation): FIXED -- _resolve_nodata now raises ValueError("nodata=... is not representable in integer dtype ...") when a non-integer float sentinel is passed for an integer input. Added test_fractional_float_sentinel_on_int_input_raises to lock it in.
  • Nit (loose isfinite assertion): FIXED -- the int64-above-2**53 regression test now asserts the surviving cell equals np.float32(1 << 60), so masking-to-NaN and value-swap regressions both get caught.

Full test_resample.py suite passes locally (185 tests). No remaining blockers, suggestions, or nits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

resample: int64 nodata mask loses precision via float cast

1 participant