Skip to content

Raise on 'majority' in zonal_stats dask path instead of silent drop (#2528)#2531

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-zonal-2528-01
May 28, 2026
Merged

Raise on 'majority' in zonal_stats dask path instead of silent drop (#2528)#2531
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-zonal-2528-01

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • zonal_stats() on dask-backed input silently dropped 'majority' from the requested stats list because the default included it but the dask block-by-block path could not compute it.
  • Replaced the mutable list default with stats_funcs=None and resolved the default per backend; numpy/cupy keep the full eight-stat default, dask resolves to the seven-stat subset.
  • Explicit 'majority' on a dask input now raises ValueError instead of being silently filtered.

Closes #2528.

Test Plan

  • pytest xrspatial/tests/test_zonal.py -- 129 pass (125 pre-existing + 4 new regression tests).
  • pytest xrspatial/tests/test_dasymetric.py -- 61 pass (dasymetric uses zonal.stats internally).
  • Cross-backend smoke test confirms numpy/cupy keep 'majority', dask backends omit it, explicit majority on dask raises with a clear message.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 27, 2026
@brendancol
Copy link
Copy Markdown
Contributor Author

Review pass (worktree, fresh test run):

  • (a) ValueError vs deprecation: ValueError is appropriate. There is no documented contract that majority worked on dask; the prior behaviour silently dropped it. Callers who explicitly passed 'majority' on dask were already broken in a worse way (silent data loss). No deprecation needed.
  • (b) Empty list stats_funcs=[]: behaves consistently across backends; returns a frame with only the 'zone' column. The new validation does not flag it (empty unsupported list). Not adding a dedicated test, the existing subset tests cover the surrounding code path.
  • (c) Docstring stats table: the per-backend default lists are now documented and the dask majority rejection is called out.
  • (d) Other callers: dasymetric.py references zonal.stats only in docstrings; accessor.zonal_stats passes kwargs through; the Dataset branch recurses with the same stats_funcs (None resolves per-backend on each per-variable call). No other module passes stats_funcs explicitly.

Applied one suggestion: the dict-on-dask error message hard-coded the supported list. Sourced it from _DEFAULT_STATS_DASK to prevent drift (1 file, 2 lines).

Tests: 129/129 test_zonal.py + 150/150 dasymetric/dataset_support/validation pass.

…2528)

zonal_stats() on dask-backed inputs silently dropped 'majority' from the
requested stats list.  The default stats_funcs included 'majority' but the
dask block-by-block path filtered it out without warning, so callers got
fewer columns than the docstring promised.

Replace the mutable list default with stats_funcs=None and resolve the
default per backend inside the function.  Numpy and cupy keep the full
eight-stat default; dask resolves to the seven-stat subset without
majority.  Explicit 'majority' on a dask input now raises ValueError
with the supported-stats list.

Fixes #2528.
The error raised when a dict is passed for stats_funcs on a dask-backed
input hard-coded the supported stats list as a string literal.  Source
the list from _DEFAULT_STATS_DASK so the message can't drift if the
constant changes.
@brendancol brendancol force-pushed the deep-sweep-accuracy-zonal-2528-01 branch from 9be320f to e0a7b7b Compare May 28, 2026 13:05
@brendancol brendancol merged commit ae6bdad into main May 28, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

zonal_stats dask backend silently drops 'majority' from default stats

1 participant