reproject: fix vertical-datum shift crash on integer DEMs#2596
reproject: fix vertical-datum shift crash on integer DEMs#2596brendancol wants to merge 2 commits into
Conversation
…2565) The geoid offset is a float, so an in-place += into an int16 / uint16 array raised UFuncTypeError. Promote the array to float32 (float64 if the source was already float64) inside `_apply_vertical_shift`, rewrite the integer nodata sentinel to NaN, and propagate the new dtype + nodata back through `attrs['nodata']`, `attrs['_FillValue']`, and `attrs['nodatavals']`. Covers numpy, cupy, and dask+numpy backends.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: reproject vertical-datum shift on integer DEMs (#2596)
Blockers
None.
Suggestions
-
xrspatial/reproject/__init__.py:1064-1083--_promoted_vertical_dtypehas two consecutive checks (if src_dtype == np.float64: return Nonethenif np.issubdtype(src_dtype, np.floating): return None) but the second check already coversfloat64. Collapse to a singleif np.issubdtype(src_dtype, np.floating): return None. Functionally identical, one fewer line, and the intent reads cleaner. -
xrspatial/reproject/__init__.py:1080-- the comment says "small-float inputs" promote to float32, but the function actually returnsNonefor every float dtype (including float16 and complex floats, both via thenp.floatingcheck). Either tighten the comment to "integer inputs" only, or add a branch that explicitly promotesfloat16tofloat32if that case matters. Recommend the wording fix unless float16 DEM support is on the roadmap.
Nits
-
xrspatial/reproject/__init__.py:1234-1255-- the new conditional block in_apply_vertical_shift_numpyreads a bit denser than the original two-line version. Consider extracting the dtype-promotion + nodata remap to a small helper (e.g._promote_for_shift(data, nodata, out_dtype)returning(result, is_nan_nodata)) so the strip loop below it stays the focus. Optional. -
The CHANGELOG entry is solid but slightly long for a "Fixed" bullet. Could tighten to one paragraph if you want -- not blocking.
What looks good
- The fix is at the right level:
_apply_vertical_shiftdecides the output dtype once and threads it through every backend, so numpy / cupy / dask all converge on the same promoted dtype and the caller'sout_attrsstays consistent. - The integer-nodata-to-NaN remap is the right design choice; it avoids carrying an "integer sentinel in a float array" footgun forward.
- Test coverage hits all the cases the issue called out: int16 -> float32 (numerical check against
geoid_height), uint8 -> float32, float32 / float64 no-extra-promotion, NaN propagation,_FillValue+nodatavalsrefresh, and a dask-backed int16 case. - The dask path correctly updates the
map_blocks(meta=...)template so downstream consumers see the promoted dtype before computing. _apply_vertical_shiftreturning(data, nodata)instead of justdatais a clean way to thread the new sentinel back to the caller without adding mutable state.
Checklist
- Algorithm matches reference (geoid offset formula H + N = h)
- All implemented backends produce consistent results (single dtype decision shared across them)
- NaN handling is correct (integer nodata -> NaN before shift, NaN passes through is_valid mask)
- Edge cases are covered by tests (nodata pixels, multiple dtypes)
- Dask chunk boundaries handled correctly (meta dtype matches block return)
- No premature materialization or unnecessary copies (one extra mask pass per call; acceptable)
- Benchmark exists or is not needed (pure correctness fix, no benchmark needed)
- README feature matrix updated (no new function, no backend coverage change)
- Docstrings present and accurate
Collapse the redundant float64 check into the single np.floating guard, and rewrite the comment so it matches what the function actually does (any non-float input promotes; every float stays put).
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review on commit 361af8f
Dispositions for the prior review
- Suggestion: collapse the duplicate
float64check in_promoted_vertical_dtype-- fixed in 361af8f. The function now readsif np.issubdtype(src_dtype, np.floating): return None. - Suggestion: misleading "small-float" wording (and float16 handling) -- fixed in 361af8f by rewriting the docstring to match the actual contract (any non-float promotes; floats are left alone). Not chasing float16 promotion since float16 DEMs are not a real-world case for this library.
- Nit: extract the dtype-promotion block to a helper -- dismissed. The inline block is ~10 lines and lives next to the one place that uses it; extracting it would split the logic across two definitions without making the strip loop meaningfully clearer. Per CLAUDE.md "no abstractions for single-use code."
- Nit: shorten the CHANGELOG entry -- dismissed. The entry documents a user-visible behaviour change (dtype and nodata both shift), and the extra lines pay for themselves when someone is reading the changelog to decide whether the fix affects them.
State after the follow-up
- All 19 existing + new vertical-shift tests still pass locally.
- No new findings on the follow-up commit.
Closes #2565.
Summary
result = data.copy()at the input dtype and then dostrip_data[is_valid] += N_total[is_valid]. For int16/uint16 DEMs (SRTM, Copernicus DEM, ASTER GDEM) this raisedUFuncTypeErrorbecause a float offset cannot be cast back into an integer array._apply_vertical_shiftnow picks anout_dtypeonce (float32 for integer or float32 inputs, float64 only when the source is float64) and forwards it through the numpy / cupy / dask backends. The numpy path casts up front and rewrites the integer nodata sentinel to NaN so the rest of the loop uses uniform NaN semantics.attrs['nodata'],attrs['_FillValue'], andattrs['nodatavals']so the metadata stays consistent with the array.Backend coverage
_apply_vertical_shift_numpy.map_blocksnow advertises the promoted dtype inmetaand forwardsout_dtypeinto every per-block call.Test plan
pytest xrspatial/tests/test_reproject.py::TestVerticalShiftIntegerDtype-- 7 new tests covering int16 promotion, uint8 promotion, float32 / float64 no-extra-promotion, nodata to NaN propagation,_FillValue/nodatavalsattr refresh, and the dask path.pytest xrspatial/tests/test_reproject.py::TestVerticalShift-- 12 existing vertical-shift tests still pass.pytest -k "vertical or geoid or Vertical"-- 54 tests pass.