Skip to content

reproject: fix vertical-datum shift crash on integer DEMs#2596

Open
brendancol wants to merge 2 commits into
mainfrom
issue-2565
Open

reproject: fix vertical-datum shift crash on integer DEMs#2596
brendancol wants to merge 2 commits into
mainfrom
issue-2565

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2565.

Summary

  • The vertical-datum shift path used to result = data.copy() at the input dtype and then do strip_data[is_valid] += N_total[is_valid]. For int16/uint16 DEMs (SRTM, Copernicus DEM, ASTER GDEM) this raised UFuncTypeError because a float offset cannot be cast back into an integer array.
  • _apply_vertical_shift now picks an out_dtype once (float32 for integer or float32 inputs, float64 only when the source is float64) and forwards it through the numpy / cupy / dask backends. The numpy path casts up front and rewrites the integer nodata sentinel to NaN so the rest of the loop uses uniform NaN semantics.
  • The caller picks up the promoted nodata and writes it back to attrs['nodata'], attrs['_FillValue'], and attrs['nodatavals'] so the metadata stays consistent with the array.

Backend coverage

  • numpy: handled directly in _apply_vertical_shift_numpy.
  • cupy: round-trips via the numpy path (existing behaviour), inherits the promotion.
  • dask+numpy: map_blocks now advertises the promoted dtype in meta and forwards out_dtype into every per-block call.
  • dask+cupy: collapses to numpy-backed dask upstream (pre-existing), so the same path applies.

Test plan

  • pytest xrspatial/tests/test_reproject.py::TestVerticalShiftIntegerDtype -- 7 new tests covering int16 promotion, uint8 promotion, float32 / float64 no-extra-promotion, nodata to NaN propagation, _FillValue / nodatavals attr refresh, and the dask path.
  • pytest xrspatial/tests/test_reproject.py::TestVerticalShift -- 12 existing vertical-shift tests still pass.
  • Broader sweep: pytest -k "vertical or geoid or Vertical" -- 54 tests pass.

…2565)

The geoid offset is a float, so an in-place += into an int16 / uint16
array raised UFuncTypeError. Promote the array to float32 (float64 if
the source was already float64) inside `_apply_vertical_shift`, rewrite
the integer nodata sentinel to NaN, and propagate the new dtype +
nodata back through `attrs['nodata']`, `attrs['_FillValue']`, and
`attrs['nodatavals']`. Covers numpy, cupy, and dask+numpy backends.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: reproject vertical-datum shift on integer DEMs (#2596)

Blockers

None.

Suggestions

  • xrspatial/reproject/__init__.py:1064-1083 -- _promoted_vertical_dtype has two consecutive checks (if src_dtype == np.float64: return None then if np.issubdtype(src_dtype, np.floating): return None) but the second check already covers float64. Collapse to a single if np.issubdtype(src_dtype, np.floating): return None. Functionally identical, one fewer line, and the intent reads cleaner.

  • xrspatial/reproject/__init__.py:1080 -- the comment says "small-float inputs" promote to float32, but the function actually returns None for every float dtype (including float16 and complex floats, both via the np.floating check). Either tighten the comment to "integer inputs" only, or add a branch that explicitly promotes float16 to float32 if that case matters. Recommend the wording fix unless float16 DEM support is on the roadmap.

Nits

  • xrspatial/reproject/__init__.py:1234-1255 -- the new conditional block in _apply_vertical_shift_numpy reads a bit denser than the original two-line version. Consider extracting the dtype-promotion + nodata remap to a small helper (e.g. _promote_for_shift(data, nodata, out_dtype) returning (result, is_nan_nodata)) so the strip loop below it stays the focus. Optional.

  • The CHANGELOG entry is solid but slightly long for a "Fixed" bullet. Could tighten to one paragraph if you want -- not blocking.

What looks good

  • The fix is at the right level: _apply_vertical_shift decides the output dtype once and threads it through every backend, so numpy / cupy / dask all converge on the same promoted dtype and the caller's out_attrs stays consistent.
  • The integer-nodata-to-NaN remap is the right design choice; it avoids carrying an "integer sentinel in a float array" footgun forward.
  • Test coverage hits all the cases the issue called out: int16 -> float32 (numerical check against geoid_height), uint8 -> float32, float32 / float64 no-extra-promotion, NaN propagation, _FillValue + nodatavals refresh, and a dask-backed int16 case.
  • The dask path correctly updates the map_blocks(meta=...) template so downstream consumers see the promoted dtype before computing.
  • _apply_vertical_shift returning (data, nodata) instead of just data is a clean way to thread the new sentinel back to the caller without adding mutable state.

Checklist

  • Algorithm matches reference (geoid offset formula H + N = h)
  • All implemented backends produce consistent results (single dtype decision shared across them)
  • NaN handling is correct (integer nodata -> NaN before shift, NaN passes through is_valid mask)
  • Edge cases are covered by tests (nodata pixels, multiple dtypes)
  • Dask chunk boundaries handled correctly (meta dtype matches block return)
  • No premature materialization or unnecessary copies (one extra mask pass per call; acceptable)
  • Benchmark exists or is not needed (pure correctness fix, no benchmark needed)
  • README feature matrix updated (no new function, no backend coverage change)
  • Docstrings present and accurate

Collapse the redundant float64 check into the single np.floating
guard, and rewrite the comment so it matches what the function
actually does (any non-float input promotes; every float stays put).
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review on commit 361af8f

Dispositions for the prior review

  • Suggestion: collapse the duplicate float64 check in _promoted_vertical_dtype -- fixed in 361af8f. The function now reads if np.issubdtype(src_dtype, np.floating): return None.
  • Suggestion: misleading "small-float" wording (and float16 handling) -- fixed in 361af8f by rewriting the docstring to match the actual contract (any non-float promotes; floats are left alone). Not chasing float16 promotion since float16 DEMs are not a real-world case for this library.
  • Nit: extract the dtype-promotion block to a helper -- dismissed. The inline block is ~10 lines and lives next to the one place that uses it; extracting it would split the logic across two definitions without making the strip loop meaningfully clearer. Per CLAUDE.md "no abstractions for single-use code."
  • Nit: shorten the CHANGELOG entry -- dismissed. The entry documents a user-visible behaviour change (dtype and nodata both shift), and the extra lines pay for themselves when someone is reading the changelog to decide whether the fix affects them.

State after the follow-up

  • All 19 existing + new vertical-shift tests still pass locally.
  • No new findings on the follow-up commit.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject: vertical datum shift crashes on integer DEMs

1 participant