polygonize: auto-detect attrs['transform'] for output georeferencing (#2536)#2541
Conversation
…2536) `polygonize(raster, return_type='geopandas')` already auto-detects `attrs['crs']` and stamps it on the returned `GeoDataFrame`. It never read `attrs['transform']`, so the geometries stayed in raw pixel space even when the raster carried a valid affine transform. The result was a `GeoDataFrame` whose `.crs` claimed projected space while the geometries were not in that space -- silent misalignment by the raster's origin offset, worse than the missing-CRS bug (#2149) because downstream callers cannot detect the inconsistency from the `GeoDataFrame` alone. Add `_detect_raster_transform()` parallel to `_detect_raster_crs()` with resolution order: 1. `raster.attrs['transform']` (xrspatial.geotiff convention, a rasterio-ordered 6-tuple) 2. `raster.rio.transform()` (rioxarray, if installed) 3. `None` Skip auto-detection when `attrs['_xrspatial_no_georef']=True` (the xrspatial.geotiff no-georef marker) and when `rio.transform()` returns the identity (rioxarray's default when no transform info is available -- otherwise the output would silently shift by zero and look identical to "no transform"). Malformed `attrs['transform']` (wrong length, non-numeric) is silently ignored, matching the existing unparseable-CRS-attr policy. An explicit `transform=` argument still wins over the attr. The geometries themselves are transformed inside `_scan`, so this applies to all return types -- the "numpy", "awkward", "spatialpandas", and "geojson" outputs all benefit. The `simplify_tolerance` path is unaffected (simplification runs after transform application).
brendancol
left a comment
There was a problem hiding this comment.
PR Review: polygonize: auto-detect attrs['transform'] for output georeferencing (#2536)
Blockers
None.
Suggestions
-
Behavioural change deserves a CHANGELOG entry. Existing user code that calls
polygonize(raster)without an explicittransform=will now return CRS-space coordinates rather than pixel coordinates whenever the raster carriesattrs['transform']or a non-identityrio.transform(). The fix matches the precedent set by #2149 for CRS auto-detection, but this is still observable from outside the package. A short note inCHANGELOG.mdunder the next release would help users who currently chainpolygonize -> external transform application. (xrspatial/polygonize.py, top-level behaviour) -
Coverage gap:
mask=interaction. None of the new tests pass bothmask=and a raster carryingattrs['transform']. The existing mask tests assert on polygon geometry without an auto-detected transform, and the new transform tests use no mask. A single combined assertion (mask set + attrs transform set + check bounds land in CRS space) would close the obvious chained-feature case. (xrspatial/tests/test_polygonize.py: TestPolygonizeTransformPropagation) -
Coverage gap:
connectivity=8. The transform is applied inside_scanregardless of connectivity, so 4-connectivity and 8-connectivity should produce identically-transformed geometries with the same auto-detected transform. A parametrized test overconnectivity in (4, 8)would catch any future change to_scanthat decouples the transform path from one of the connectivities. (xrspatial/tests/test_polygonize.py: TestPolygonizeTransformPropagation)
Nits
-
test_no_georef_marker_suppresses_auto_detectexercises a state the geotiff writer never produces.xrspatial/geotiff/_attrs.py:782-785only emits_xrspatial_no_georef=Truewhenhas_georef=False, and in that branch it does NOT writeattrs['transform']. The two attrs are mutually exclusive in practice, so the test is defensive against a malformed dict rather than a real read path. A one-line comment in the test acknowledging that would help future readers. (xrspatial/tests/test_polygonize.py:1432-1440) -
Identity-affine short-circuit.
_detect_raster_transformreturnsNonewhenrio.transform()reports the identity. Applying the identity transform is a no-op, so the short-circuit is harmless but technically suppresses a legitimate "user really did stamp identity on this raster" case. The current comment explains the rationale; leaving it as-is is fine. (xrspatial/polygonize.py:644-650)
What looks good
- The helper is structurally parallel to
_detect_raster_crs, which keeps the file easy to scan. - Malformed
attrs['transform']falls back rather than raising, matching the existing unparseable-CRS policy. The newtest_transform_attr_invalid_length_falls_backandtest_explicit_transform_invalid_length_still_raisescodify both sides of that split. - The dask path was already routing the user's
transformthrough_merge_from_separatedrather than through the per-chunk call, so the auto-detected transform reaches the merge step unchanged.test_transform_attr_auto_detected_daskexercises this end-to-end. - The docstring update is specific about the resolution order and notes that auto-detection applies to all
return_typeoptions, not onlygeopandas.
Checklist
- Algorithm matches reference: transform application is unchanged from existing code, only the input path is widened.
- All implemented backends produce consistent results (numpy and dask covered by tests; cupy / dask+cupy share the same orchestrators).
- NaN handling is unchanged.
- Edge cases covered: no transform, malformed attr (multiple shapes), explicit override, rioxarray-only source.
- Dask chunk boundaries: transform application is at the merge step, not per-chunk, so chunk boundaries do not interact with the new path.
- No premature materialization or unnecessary copies.
- [N/A] Benchmark not needed for a metadata-detection helper.
- [N/A] README feature matrix unchanged (no new function, no backend change).
- Docstrings present and accurate.
- CHANGELOG.md: add an Unreleased "Fixed" entry calling out the behavioural change (auto-detected transform now applies to callers who previously got pixel coords from a georef'd raster), with the opt-out recipe. - test_polygonize.py: add `test_transform_attr_with_mask` to verify the auto-detected transform composes with the `mask=` argument. - test_polygonize.py: parametrize `test_transform_attr_works_with_ both_connectivities` over `connectivity in (4, 8)` so a future change that decouples `_scan`'s transform call site from one of the connectivities cannot regress silently. - test_polygonize.py: add a comment on `test_no_georef_marker_suppresses_auto_detect` explaining that the state it exercises is not produced by xrspatial.geotiff in practice (mutually exclusive attrs), so the test guards against malformed third-party dicts rather than a known read path. The identity-affine short-circuit nit was left as-is per the review write-up: applying identity is a no-op, the existing comment already documents the rationale.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review pass
Addressed in a2835e92:
- Suggestion 1 (CHANGELOG entry): Fixed. Added an
Unreleased > Fixedentry toCHANGELOG.mddocumenting the behavioural change and the opt-out (transform=np.array([1,0,0,0,1,0])). - Suggestion 2 (mask + transform coverage): Fixed. Added
test_transform_attr_with_maskwhich masks out the right column and asserts the surviving polygon emerges in CRS space. - Suggestion 3 (connectivity=8 coverage): Fixed. Replaced the single-connectivity test with
test_transform_attr_works_with_both_connectivitiesparametrized overconnectivity in (4, 8). - Nit 1 (no_georef test comment): Fixed. Added an explanatory comment on
test_no_georef_marker_suppresses_auto_detectnoting that the geotiff writer never emits the marker and a transform attr together (mutually exclusive at_attrs.py:782-785), so the test guards against malformed third-party dicts rather than a real read path. - Nit 2 (identity-affine short-circuit): Dismissed. The existing comment in
_detect_raster_transformalready documents the rationale, and applying the identity affine is a no-op, so the short-circuit causes no observable difference. The original review write-up agreed leaving as-is is fine.
Test count: 15 tests in TestPolygonizeTransformPropagation, all passing. Full file: 156 passed, 13 skipped, no regressions.
Post-merge notes on #2541Re-ran the tests against the merge commit (b3c55f2): 15/15 in One real findingCHANGELOG entry undersells the scope. It only mentions Concrete fix: drop the Smaller noteThe identity-affine short-circuit in Confirming the parity rationaleCupy and dask+cupy aren't separately tested. The PR body's reasoning checks out: Things that look good
Can spin up a one-line CHANGELOG follow-up PR if that's useful. |
Summary
polygonize(return_type='geopandas')already auto-detectedattrs['crs']and stamped it on the outputGeoDataFrame, but never readattrs['transform'], so geometries stayed in raw pixel space. TheGeoDataFrameended up with a.crsclaiming projected space (e.g. EPSG:3857) while the geometries were not in that space. Silent misalignment, worse than the missing-CRS bug fixed in #2149: the downstream caller could not detect the inconsistency from theGeoDataFramealone.This PR adds
_detect_raster_transform()parallel to_detect_raster_crs(). Resolution order:raster.attrs['transform'](xrspatial.geotiff convention, a rasterio-ordered 6-tuple)raster.rio.transform()(rioxarray, if installed)NoneSkipped when
attrs['_xrspatial_no_georef']=True(the xrspatial.geotiff no-georef marker), and whenrio.transform()returns the identity affine (rioxarray's default when no transform info is available; otherwise the output would silently shift by zero and look identical to "no transform"). Malformedattrs['transform'](wrong length, non-numeric) is silently ignored, matching the existing unparseable-CRS-attr policy. An explicittransform=argument still wins over the attr.The geometries are transformed inside
_scan, so this applies to allreturn_typeoptions. The "numpy", "awkward", "spatialpandas", and "geojson" outputs all benefit. Thesimplify_tolerancepath is unaffected -- simplification runs after transform application.Closes #2536.
Backend coverage
test_transform_attr_auto_detected_numpyandtest_transform_attr_auto_detected_geopandastest_transform_attr_auto_detected_dask_polygonize_cupy, which transfers to CPU before calling_scan, and the dask+cupy path uses the same_polygonize_daskorchestrator as dask+numpy. Both honour thetransformargument that this PR now populates from the raster attrs.Test plan
pytest xrspatial/tests/test_polygonize.py::TestPolygonizeTransformPropagation(12 new tests pass)pytest xrspatial/tests/test_polygonize.py(full file: 153 passed, 13 skipped, no regressions)pytest xrspatial/tests/test_polygonize_coverage_2026_05_19.py xrspatial/tests/test_polygonize_issue_2172.py(96 passed)