Is your feature request related to a problem or challenge?
DataFusion's default execution creates separate HashSet accumulators for each distinct count per group. With high-cardinality data (eg sellers with 4,000+ distinct cities in an ecommerce dataset), this causes memory explosion.
Describe the solution you'd like
DataFusion already optimizes the single shared distinct field case via SingleDistinctToGroupBy. This PR adds a conservative logical rewrite for multiple distinct COUNT(DISTINCT …) arguments by splitting work into per-distinct branches joined on the group keys, which reduces peak memory for eligible plans.
COUNT(DISTINCT x) must ignore NULL x; the rewrite applies x IS NOT NULL on each distinct branch before inner grouping so semantics stay aligned with count_distinct behavior.
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
DataFusion's default execution creates separate HashSet accumulators for each distinct count per group. With high-cardinality data (eg sellers with 4,000+ distinct cities in an ecommerce dataset), this causes memory explosion.
Describe the solution you'd like
DataFusion already optimizes the single shared distinct field case via SingleDistinctToGroupBy. This PR adds a conservative logical rewrite for multiple distinct COUNT(DISTINCT …) arguments by splitting work into per-distinct branches joined on the group keys, which reduces peak memory for eligible plans.
COUNT(DISTINCT x) must ignore NULL x; the rewrite applies x IS NOT NULL on each distinct branch before inner grouping so semantics stay aligned with count_distinct behavior.
Describe alternatives you've considered
No response
Additional context
No response