This PR is large because it spans expression APIs, Arrow/Parquet conversion, metrics generation, and documentation. Recommended strategy: review in focused passes by concern, not file order.
-
Core geospatial utility correctness
pyiceberg/utils/geospatial.pytests/utils/test_geospatial.py- Focus on envelope extraction, antimeridian behavior, and bound encoding formats.
-
Metrics integration and write/import paths
pyiceberg/io/pyarrow.pytests/io/test_pyarrow_stats.py- Focus on:
- geospatial bounds derived from row WKB values
- skipping Parquet binary min/max for geospatial columns
- partition inference not using geospatial envelope bounds
-
GeoArrow compatibility and ambiguity boundary
pyiceberg/schema.pypyiceberg/io/pyarrow.pytests/io/test_pyarrow.py- Confirm:
- planar equivalence enabled only at Arrow/Parquet boundary
- spherical mismatch still fails
- fallback warning when GeoArrow dependency is absent
-
Spatial expression surface area
pyiceberg/expressions/__init__.pypyiceberg/expressions/visitors.pytests/expressions/test_spatial_predicates.py- Confirm:
- bind-time type checks (geometry/geography only)
- visitor plumbing is complete
- conservative evaluator behavior is explicit and documented
-
User-facing docs
mkdocs/docs/geospatial.md- Check limitations and behavior notes match implementation.
-
Boundary scope leakage
- Ensure planar-equivalence relaxation is not enabled globally.
-
Envelope semantics
- Geography antimeridian cases (
xmin > xmax) are expected and intentional.
- Geography antimeridian cases (
-
Metrics correctness
- Geospatial bounds are serialized envelopes, not raw value min/max.
-
Conservative evaluator behavior
- Spatial predicates should not accidentally become strict in metrics/manifest evaluators.
uv run --extra hive --extra bigquery python -m pytest tests/utils/test_geospatial.py -q
uv run --extra hive --extra bigquery python -m pytest tests/io/test_pyarrow_stats.py -k "geospatial or planar_geography_schema or partition_inference_skips_geospatial_bounds" -q
uv run --extra hive --extra bigquery python -m pytest tests/io/test_pyarrow.py -k "geoarrow or planar_geography_geometry_equivalence or spherical_geography_geometry_equivalence or logs_warning_once" -q
uv run --extra hive --extra bigquery python -m pytest tests/expressions/test_spatial_predicates.py tests/expressions/test_visitors.py -k "spatial or translate_column_names" -q- Geometry/geography bounds are present and correctly encoded for write/import paths.
geometryvsgeography(planar)is only equivalent at Arrow/Parquet compatibility boundary with CRS equality.geography(spherical)remains incompatible withgeometry.- Spatial predicates are correctly modeled/bound; execution and pushdown remain intentionally unimplemented.
- Missing GeoArrow dependency degrades gracefully with explicit warning.
- Docs match implemented behavior and limitations.