[Experiment] Adaptive filter pushdown by adriangb · Pull Request #22144 · apache/datafusion

adriangb · 2026-05-13T05:34:15Z

No description provided.

Replaces PR #9's morsel-per-row-group split with in-decoder strategy swap. One `ParquetPushDecoder` per file, one `BoxStream` per file, filter placement re-evaluated at every row-group boundary using the shared `SelectivityTracker`. - The chunk loop (`ParquetAccessPlan::split_into_chunks`, `Vec<BoxStream>` returns from `build_stream`). - Per-chunk `AsyncFileReader::create_reader` minting and per-chunk `RowFilter` rebuild. - The `EarlyStoppingStream`-on-chunk-0-only special case for the non-`Clone` `FilePruner`. - `LazyMorselShared` per-morsel Arc churn — the source of the ~10% aggregate ClickBench regression you flagged in PR #9 review. `AdaptiveParquetStream` (new in `opener.rs`) drives one row group at a time via `try_next_reader`: 1. Pull a `ParquetRecordBatchReader` for the next row group. 2. Iterate it synchronously; each batch goes through any post-scan filters (which feed per-filter stats into the tracker) and then through the projector. 3. When the reader exhausts, ask the tracker to re-partition filters based on accumulated stats. If the row-filter set changed, build a new `RowFilter` and call the new arrow-rs `ParquetPushDecoder::swap_strategy` before requesting the next reader. Post-scan filters update in lockstep. `PushBuffers` carries through the swap so already-fetched bytes are preserved, and the optional-filter mid-stream skip mechanism (existing `OptionalFilterPhysicalExpr` + `tracker.is_filter_skipped`) keeps working unchanged inside `apply_post_scan_filters_with_stats`. - `selectivity.rs` — `SelectivityTracker`, `PartitionedFilters`, `FilterId`, Welford CI bounds. Verbatim. - `row_filter.rs` — new `build_row_filter` signature returning `(Option<RowFilter>, UnbuildableFilters)` plus `total_compressed_bytes`, plus `DatafusionArrowPredicate` stat hooks. - `physical_expr.rs` — `OptionalFilterPhysicalExpr`, `snapshot_generation` helpers. `Display` is **pass-through** here (PR #9 used `Optional(...)`), keeping every existing sqllogictest expected output intact. - `config.rs` — adds `filter_pushdown_min_bytes_per_sec` / `filter_collecting_byte_ratio_threshold` / `filter_confidence_z`. **`reorder_filters` is preserved as a deprecated no-op** (per request) — the adaptive tracker subsumes it. - `selectivity_tracker.rs` bench — verbatim. - Per-file plumbing in `source.rs`: `predicate_conjuncts: Vec<(FilterId, Arc<PhysicalExpr>)>` instead of a single AND-ed predicate so per-conjunct stats accumulate across files. Depends on `pydantic/arrow-rs:adaptive-strategy-swap`, which adds `ParquetPushDecoder::can_swap_strategy()` / `swap_strategy(StrategySwap)` and the `StrategySwap` builder. The `Cargo.toml` `[patch.crates-io]` block points at it. - Sub-row-group adaptation (would need a `ParquetRecordBatchReader::pause` primitive in arrow-rs to yield a residual `RowSelection`); useful for TPCDS-style single-huge-row-group files. Defer. - Three new config knobs aren't in the proto schema yet; `from_proto` fills with config defaults so a roundtrip preserves behavior. - `cargo test -p datafusion-datasource-parquet --lib` — 143 passed - `cargo test -p datafusion --lib` — 410 passed - `cargo test -p datafusion --test core_integration` — 935 passed - `cargo test -p datafusion-sqllogictest --test sqllogictests` — all pass except `encrypted_parquet.slt` (pre-existing on upstream/main, not related to this change) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Fix 6 broken intra-doc links in `opener.rs`: `RowFilter`, `PushBuffers`, `AsyncFileReader::create_reader`, `SelectivityTracker` weren't visible from the doc-comment scope. Reword to plain backticks for the names that don't have a stable in-scope path; route `SelectivityTracker` through `crate::selectivity::SelectivityTracker`. - Regenerate `docs/source/user-guide/configs.md` via `dev/update_config_docs.sh` to surface the three new `filter_pushdown_min_bytes_per_sec` / `filter_collecting_byte_ratio_threshold` / `filter_confidence_z` rows the CI doc check expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…33dd62 Picks up the rustdoc fix from the arrow-rs companion branch so the DataFusion CI doc job resolves clean too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The example asserts `pushdown_rows_pruned=1` to demonstrate that the row-filter path actually evicts rows. Under the adaptive scheduler's default `filter_pushdown_min_bytes_per_sec = 100 MB/s`, a small example file's filter starts on the post-scan path (where `pushdown_rows_pruned` stays 0) and the assertion fires. Set `filter_pushdown_min_bytes_per_sec = 0` to disable the throughput check and force every filter to row-level — the same lever `physical_plan/parquet.rs` test harness uses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two fixes for benchmark regressions and crashes on hits_partitioned ClickBench queries: # Hard failures (Q36, Q38, Q41, Q42) `build_stream` was building the wide ProjectionMask from `user projection ∪ post_scan_conjuncts` only, but a row-level conjunct can get demoted to post-scan mid-stream by `maybe_swap_strategy`. When that happened, the demoted filter's column wasn't in the `stream_schema`, and the post-scan rebase via `reassign_expr_columns` fired a `Schema error: Unable to get field named "..."` against the narrow batch. Fix: include **every** predicate conjunct's columns in the wide projection regardless of current placement. Filter-only columns are still stripped after post-scan filtering by the projector, so the user-visible schema is unchanged. # Initial-placement regressions (Q10, Q11, Q13, Q14, Q26) Queries shaped like `SELECT col, ... FROM t WHERE col <> '' GROUP BY col` had the filter column already in the user projection. The byte-ratio heuristic was counting filter bytes against projection bytes naively, so `MobilePhoneModel_bytes / (MobilePhoneModel_bytes + UserID_bytes) ≈ 0.5` exceeded the 0.20 threshold and pushed the filter to post-scan — even though row-level was strictly better (zero extra I/O, late materialization saves UserID decode for pruned rows). Fix: change the heuristic numerator from `filter_bytes` to **extra** bytes — bytes for filter columns *not* already in the user projection. A filter that only references projection columns now gets `byte_ratio = 0` and starts at row-level. Threading required: add `projection_columns: &HashSet<usize>` to `SelectivityTracker::partition_filters` (and the inner impl); opener's `AdaptiveParquetStream` carries it for mid-stream re-evals. # Test plan - All 4 hard-failure queries (Q36/Q38/Q41/Q42) now run to completion locally on hits_partitioned. - 143 datasource-parquet unit tests pass (38 partition_filters call-sites in the test module updated to the new signature). - Benchmark expectations: Q23/Q22/Q6 wins should hold; Q10/Q11/Q13/Q14 regressions should resolve via the better initial placement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ement Bench showed Q10/Q11/Q13/Q14/Q26 still regressing 1.20-1.47x even after the overlap-aware heuristic. These queries are shaped like \`SELECT col, ... FROM t WHERE col <> '' GROUP BY col\` — filter column entirely in projection, so \`extra_bytes = 0\` and \`byte_ratio = 0\`. The previous heuristic placed them at row-level since \`0 <= threshold\`, but row-level *isn't* free even at zero extra I/O: predicate-cache eviction on heavy string columns means the filter column gets decoded twice (once for the predicate eval, once for the projection), and the late-materialization payoff depends on a selectivity we don't know yet. Local timings on hits_partitioned (release mode): | Query | main + no-pushdown (baseline) | branch (old heuristic) | branch (new heuristic) | |-------|------------------------------:|-----------------------:|-----------------------:| | Q23 | 3708 ms | 219 ms* | 219 ms | | Q22 | 1344 ms | 902 ms* | 902 ms | | Q26 | 41 ms | 60 ms | 48 ms | | Q10 | 82 ms | 109 ms | 88 ms | Q23/Q22 wins are preserved (Q23 +17x faster vs baseline, Q22 +1.5x). Q10/Q26 regressions go from 1.32-1.45x to 1.07-1.17x — the residual is the cost of pushdown_filters=true vs false generally, not our adaptive layer. Why Q23 isn't hurt: its huge speedup comes from row-group statistics pruning via the TopK dynamic filter on EventTime, not from row-level filter evaluation. Pruning is independent of row-level vs post-scan placement; the dynamic filter still reaches the source and the PruningPredicate still applies. (Local repro confirms — Q23 actually gets slightly faster on the new heuristic because we skip the double-decode of the heavy URL string column.) Implementation: change the new-filter row-level condition from \`byte_ratio <= threshold\` to \`extra_bytes > 0 && byte_ratio <= threshold\`. Pure-overlap filters (extra_bytes == 0) start at post-scan; the tracker promotes them later if measured bytes-saved-per-sec justifies it. Filters with non-zero extra cost that fits within \`byte_ratio_threshold\` (small int predicate against a heavy string projection) still start at row-level — that's the case where the heuristic is genuinely useful. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two changes that work together to make Q10/Q11/Q13/Q14/Q26 stop regressing without giving up the Q23/Q22 wins. # 1. Prune-rate gate on PostScan → RowFilter promotion Adds a second gate on top of the existing `filter_pushdown_min_bytes_per_sec` CI bound: a filter only gets promoted from post-scan to row-level if it actually prunes >= 99% of rows it sees. Why: the bytes-saved-per-sec metric is "potential savings if at row-level" (rows_pruned × non-filter-projection-bytes-per-row ÷ eval_time). For ClickBench Q10 (\`MobilePhoneModel <> ''\`) the selectivity is ~94% and the projection is heavy, so bytes-saved-per-sec clears the 100 MB/s threshold easily. But row-level *actually loses* to post-scan there because survivors are uniformly scattered: at 8K rows per page, p^N for p=0.94 is ~10^-220 — effectively zero pages can be skipped, RowSelection-driven decode is just as expensive as a contiguous post-scan read but with extra predicate-cache eviction on the heavy string column. The 0.99 gate captures the scatter problem structurally: - Clustered survivors (TopK dynamic filter, hash-join build): prune_rate trivially ≥ 0.99 once K shrinks. Page-skip works. Promote. - Uniform survivors at moderate selectivity (Q10/Q11/Q13/Q14/Q26): prune_rate stays at 0.5–0.95. Page-skip can't work no matter how big bytes-saved-per-sec is. Stay at post-scan. Q22's `Title LIKE '%Google%'` (prune_rate ~1.0) and Q23's `URL LIKE '%google%'` (similar) trivially clear the gate, so their big wins are preserved. # 2. Drop STATS_SAMPLE_INTERVAL (1/32 → every batch) I added the 1/32 sampling earlier when the per-batch `Instant + tracker.update` was clearly hot — but at the time the heuristic was over-promoting these queries to row-level, making the per-batch path matter much more. Now that the prune-rate gate keeps them at post-scan, sampling actually *hurts*: with 1/32 the Welford accumulator converges 32× slower, so the tracker takes longer to realize "this filter is bad at row-level" and the in-flight filter flips state more often. Updating every batch is faster on every query I measured (Q23, Q22, Q26, Q10). `SKIP_FLAG_CHECK_INTERVAL = 4` stays — it gates the OptionalFilter skip-flag check, not the Welford update, and removing *it* added ~200ms to Q22 (the post-update lock-juggle isn't free). # Local timings (warm, hits_partitioned, 12 partitions) | Query | main+nopush | branch | Δ | |-------|------------:|-------:|---| | Q23 | 3271ms | 168ms | **+19.5x** | | Q22 | 1069ms | 901ms | +1.19x | | Q26 | 39ms | 41ms | matches (+2ms) | | Q10 | 68ms | 59ms | **+1.15x** | All four ≥ baseline. Q26 is essentially break-even; the residual 2ms is below run-to-run noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Earlier I had two sampling/gate constants protecting the hot per-batch update path: - \`STATS_SAMPLE_INTERVAL = 32\` in opener.rs: skip the \`Instant::now\` + \`tracker.update\` work on 31 of every 32 batches. - \`SKIP_FLAG_CHECK_INTERVAL = 4\` in selectivity.rs: inside tracker.update, skip the post-stats CI-bound + lock-juggle path on 3 of every 4 calls. Both were "right" given the prior over-promotion problem (filters landing at row-level when they shouldn't, making the per-batch path hot and the CI calc wasted). With the new \`prune_rate >= 0.99\` gate those filters stay at post-scan and the measurements no longer support sampling: - Removing \`STATS_SAMPLE_INTERVAL\` (every batch updates) is *faster* than 1/32 across Q23/Q22/Q26/Q10. Slower convergence on 1/32 made the tracker take longer to settle, so the in-flight filter chain flipped state more often. - \`SKIP_FLAG_CHECK_INTERVAL = 4\` was protecting *non-optional* filters from a wasted-work path (post-stats CI calc + lock release + is_optional HashMap read + lock reacquire) that they didn't need at all. The right fix is to early-return for non-optional filters *before* that path, not to amortize it across 4 calls. This refactor: 1. Caches \`is_optional: bool\` inline on \`SelectivityStats\`. Non-optional filters early-return after the Welford update with a single field load on the already-held stats lock — no extra HashMap, no \`RwLock::read()\`, no \`drop\` + reacquire. 2. For optional filters (hash-join build / TopK dynamic), the skip-flag CI check now runs every batch. That's what we want: when a filter's selectivity collapses, the skip flag should fire ASAP. Q26's TopK dynamic filter benefits visibly from this. 3. Drops the now-redundant \`SelectivityTracker::is_optional\` HashMap and \`PartitionResult::new_filter_ids\` (was duplicating \`new_optional_flags\`). The is_optional bit moves to where it's read. 4. Drops the sampling in \`apply_post_scan_filters_with_stats\`. \`tracker.update\` is now cheap enough on the fast path that sampling actively hurts (slower convergence > saved work). Local timings (warm, hits_partitioned, 12 partitions): | Query | main+nopush | branch | Δ | |-------|------------:|-------:|---| | Q23 | 3271ms | 139ms | **+23.5x** | | Q22 | 1069ms | 898ms | +1.19x | | Q26 | 39ms | 39ms | matches | | Q10 | 68ms | 59ms | **+1.15x** | 143 lib tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the all-or-nothing batch-level "if matched == 0, all skippable; otherwise 0" computation with a sub-batch windowed analysis fed by a new \`count_skippable_bytes\` helper. The metric is now: for each batch: skippable_bytes_for_batch = total_other_projection_bytes_for_batch × (windows-with-zero-survivors / total-windows) with W = 8192 rows (short-circuited so total_windows=1 ⇒ binary "is the whole batch all-pruned" — equivalent to the old behavior on typical 8K batch sizes, but with the structure in place for finer W on larger pages or different writers). Why: \`filter_pushdown_min_bytes_per_sec\` is the right *unit* but the metric feeding it overestimated savings whenever the filter pruned rows that the row-level decoder couldn't actually drop a page on. A 50% filter on uniform data still costs full IO at row-level (every page has survivors); a 50% filter on contiguous data lets the decoder skip half the pages. The windowed analysis discriminates these — same formula at post-scan (predicting what row-level would save) and at row-level (measuring what the decoder did skip, modulo within-window RowSelection narrowing which is an uncounted bonus). Same metric on both sides means \`min_bytes_per_sec\` is the only knob; no separate prune-rate gate. The 0.99 gate is now redundant — if prune-rate is high enough that page-skip works, the metric already clears the threshold; if prune-rate is high but scatter is uniform (case C, ClickBench Q10/Q11/Q13/Q14/Q26), the metric stays low and the filter stays at post-scan. Helper short-circuits when: - batch is fully pruned (\`true_count == 0\`) → all skippable, - batch has no zeros (\`true_count == n\`) → 0 skippable, - there's only one window (\`n ≤ W\`) and the answer is determined. This avoids ~2× per-batch \`true_count\` work that was visible as a regression when I first wired the helper through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Last bench (97c62a6) added 5 new individual regressions vs the prior best-bench commit (05590a2): Q4 +39ms, Q31 +64ms, Q35 +48ms, Q40 1.47x, plus several smaller. Total query time ticked back up. Two changes between those commits did the damage: 1. Removed `STATS_SAMPLE_INTERVAL=32`. Locally the un-sampled version was faster, but on the 12-vCPU GKE bench every partition contends on the same per-filter `Mutex<SelectivityStats>` and the lock contention dominates. Restoring the 1-in-32 sampling cuts hot- path lock pressure to ~3% of what it was while still giving the Welford accumulator hundreds of samples per query. 2. Removed the `prune_rate >= 0.99` gate. The scatter-aware metric alone is too lenient on ClickBench data: columns like `MobilePhoneModel` and `SearchPhrase` have natural runs of empty values that occasionally cluster into batch-level "all pruned" events even when the filter's overall selectivity isn't high enough for row-level to actually win once arrow-rs's predicate- cache double-decode of heavy string columns kicks in. The prune_rate floor is a belt-and-braces guard; it's compatible with the scatter metric (both must pass) and prunes the cases where the metric over-promotes. Keeping the scatter helper structure in place — the `count_skippable_bytes` framework stays so that when `arrow-rs` exposes pages-skipped-via-RowSelection (option 1 from the earlier plan), the row-level path can swap from the windowed estimate to the true measurement with no formula change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reverts the `1-in-32` sampling gate. The earlier rationale was lock contention on `Mutex<SelectivityStats>`, but the empirical effect on ClickBench is that promotion happens 32× later, which dominates the contention savings for short-running selective queries (Q22/Q23/Q24). Q23 went from 169ms (every-batch) to 443ms (1/32 sampled) on the 12-vCPU bench while regressing many small queries. Sample every batch so the Welford accumulator hits the CI threshold inside the first row group. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The gate was added on the theory that arrow-rs's row-level path double-decoded heavy string columns when filter and projection overlapped, costing more than the ~60-95% selectivity could recover. EXPLAIN ANALYZE on ClickBench refutes that theory: Q23 (URL LIKE '%google%') shows predicate_cache_inner_records=8.76M and predicate_cache_records=83.67K — the cache works correctly, heavy strings are decoded once and reused for both predicate and projection. The residual ClickBench regressions we attributed to "double-decode" (Q26 / Q31) trace to a different cause: post-scan filtering inside the opener shifts batch-arrival order at downstream TopK, which changes the convergence point of TopK's dynamic filter and slightly weakens file-stats pruning. Forced row-level promotion of Q26 makes it slower (59ms) than post-scan (41ms), confirming the gate isn't preventing a real regression. Single promotion gate now: CI lower bound on scatter-aware bytes-saved-per-sec ≥ filter_pushdown_min_bytes_per_sec. This lets strongly-selective contiguous filters (90% prune rate, page-aligned runs) get promoted, which the 0.99 cutoff was incorrectly blocking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a `LimitedBatchCoalescer` to `AdaptiveParquetStream`'s post-scan filter path, mirroring `FilterExec`'s behavior. Without this, inline post-scan filtering yields tiny batches (1-100 rows each on selective predicates) directly to TopK, which delays the dynamic filter from tightening: TopK only progressively improves its threshold one small batch at a time, while `FilterExec`'s coalescer ensures the first batch to TopK already contains thousands of survivors and lets TopK pick a near-optimal top-K threshold in one shot. Symptom this fixes: on `Q26` (`SELECT SearchPhrase FROM hits WHERE SearchPhrase <> '' ORDER BY EventTime LIMIT 10`) at 12 partitions, branch matches 33-34 file ranges vs main+pushdown=false's 28. With the coalescer, branch matches 30-32 — closing ~1/3 of the gap. The remaining ~2-pruning difference is unexplained but small. Coalescer params match `FilterExec`: target_batch_size from session, biggest_coalesce_batch_size = target/2 (set inside `LimitedBatchCoalescer::new`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This reverts commit d146ebe.

When a filter is first observed, consult per-conjunct row-group statistics pruning rate as a selectivity prior. If pruning rate >= prior_promote_threshold (default 0.5) place at row-level; if <= prior_demote_threshold (default 0.05) and stats present, place post-scan; else fall back to existing byte-ratio heuristic. Skips the prior entirely when no row-group statistics are available for the filter's columns, since 'no stats' would look identical to 'genuinely non-selective' otherwise. Refs report.md §7.2.b

Merge of exp/page-pruning-prior and exp/latency-aware-z. The two levers should compose well: the prior settles initial placement from row-group statistics on the first row group, and the latency-aware z then drives evidence-based moves only when the runtime measurements disagree with the prior. Goal: keep exp1's wins on regression queries (TPC-DS Q26) while avoiding exp2's borderline-flip outliers (ClickBench Q37 under latency).

…n filter)

Documented four hypotheses tried to fix Q64 inside the morselization+adaptive merge: placeholder→postscan, all-postscan via min_bytes=INF, wider stream schema with all demoted filters applied stream-side, and a sanity check confirming the regression is structural to apache#21766+pushdown=true regardless of adaptive logic. None of the four moved Q64 below ~22 s. The fix has to come from within apache#21766 (or sub-RG adaptation in arrow-rs), not the adaptive scheduler.

Implemented per-conjunct page-pruning prior on exp/page-pruning-prior-v2 (page-first, row-group fallback gated on 'page index NOT loaded'). Smoked at +18 % vs exp3. Two failure modes: page-pruning eval is itself an 'extra pruning run' in the cost sense (walks page index per-conjunct per-file), and removing the row-group fallback lost exp3's Q26-style demote. A proper architecture (extract per-conjunct rates from the existing opener prunings as a side effect) needs ~200-300 LOC of PruningPredicate API additions; documented as a follow-up. exp/pp-plus-laz remains the recommended landing target.

…r pruning Refactors the prior so it consumes per-FilterId page-pruning rates extracted from the page-index pruning the opener already runs, with NO extra pruning passes: PagePruningAccessPlanFilter: - new optional 'tags: Option<Vec<usize>>' field - new_tagged() constructor that accepts pre-split conjuncts each tagged with a caller id (typically FilterId) - prune_plan_with_per_conjunct_stats() variant that runs the same pruning iteration as prune_plan_with_page_index but also surfaces a Vec<PerConjunctPageStats> with rows-seen / rows-skipped per conjunct. Opener (build_stream): - When predicate_conjuncts is set, build the page filter via new_tagged so per-conjunct stats can survive the split. - Reorder: prune_by_limit + page-index pruning now run BEFORE the initial partition_filters call, so per-FilterId rates are available as the prior on the very first placement decision. - Capture per-conjunct rates into HashMap<FilterId, f64>, thread into AdaptiveParquetStream as page_pruning_rates, and pass on every partition_filters call (initial + mid-stream swap). SelectivityTracker::partition_filters: - New page_pruning_rates parameter. - The initial-placement prior now reads from this map; falls back to byte-ratio when no rate is available (page index disabled, multi-column predicate, schema mismatch). - The old per-conjunct re-evaluation 'pruning_rate_for_filter' is no longer called on the production path.

The per-conjunct page-pruning rates path is now the production path. The old per-conjunct row-group re-evaluation helpers (pruning_rate_for_filter, build_per_conjunct_pruning_predicate) were never reached after the partition_filters refactor — removed. Also drops the temporary debug! traces from page_filter.rs and opener.rs that confirmed the architecture works (TPC-DS Q26 fires page-prior with pruned_rate=0.726 ≥ 0.5 → row-level, cd_gender/marital_status fire with pruned_rate=0.000 ≤ 0.05 → post-scan). ClickBench's hits_partitioned files lack page indexes, so the prior never fires there — falls back to byte-ratio per the design ('if user disables page pruning we don't get this data → only seed based on bytes heuristic').

After r5+r6+ consolidation, the top-level use is unused; the helper imports it locally to keep the module surface clean.

Three findings from this round: 1. r11 attempted to use per-conjunct page-pruning rates as a tertiary key in the post-scan/row-filter ordering closure, after Welford effectiveness and before filter_scan_size. The change compiled cleanly but produced no measurable change in TPC-DS-lat smoke (76985 vs r10 ctrl 76785 — within noise). Stash dropped. 2. The headline finding: re-running r10 and exp3 back-to-back in the same machine state showed they are at parity on TPC-DS-lat smoke (r10 76785, exp3 77237 — 0.6% apart). The previously recorded '7% gap to exp3' was cross-session machine-state variance, not a real perf delta. 3. Therefore round 6's architecturally-correct stack (r6+r7+r8, plus the round-9 partial-AND attempt kept neutral) is the take-it-now branch. No further pure-prior iterations needed.

Pre-existing lints from the round-6 work that were committed without clippy passing. None of these are behavioural; they're presentation fixes: - opener.rs: replace banned std::time::Instant with the WASM-safe datafusion_common::instant::Instant - page_filter.rs: PagePruningAccessPlanFilter::new_tagged takes schema by &SchemaRef instead of by value (it never consumed it) - pruning_predicate.rs: inline the format args in a debug! call - row_group_filter.rs: fold prune_by_bloom_filters into its single remaining caller (a test helper) so the unused legacy variant goes away; drop the redundant .into_iter() on a zip target - selectivity.rs: #[expect(too_many_arguments)] on partition_filters (8 args) and the inner mutating partition_filters (11 args) — the shape is intentional. Remove the cfg(test) gate on partition_filters_for_test so the bench target can call it - benches/selectivity_tracker.rs: switch the bench to the partition_filters_for_test helper, which forwards to the public signature using metadata-derived schemas cargo clippy -p datafusion-datasource-parquet -p datafusion-pruning --all-targets --all-features -- -D warnings now passes.

ClickBench-lat smoke (3 iters, same-state side-by-side): r10 : 89434 ms exp3 : 88654 ms diff : +780 ms (0.9%, within noise) Combined with the earlier TPC-DS-lat finding (within 0.6%), the round-6 architecturally-correct stack is at parity with exp3 on both major latency-pushdown smokes. Round 6 done.

Commit 97c62a6 ("feat(parquet): scatter-aware bytes-saved metric") reformulated SelectivityStats::update so that callers pre-compute the skippable_bytes argument (= rows_pruned × bytes_per_row in the simple case) instead of having update derive it internally from matched/total/total_bytes. The tests in this file weren't updated at the time and have been failing since. Apply the new caller-side semantics: - 'all rows matched, no pruning' calls now pass 0 for skippable_bytes (no late-mat payoff for batches the filter doesn't shrink) - 'high effectiveness' calls scale the supplied bytes by the rows-pruned ratio - test_effectiveness_zero_bytes_seen now asserts Some(0.0) since a zero-payoff batch is a legitimate Welford sample (whose comment in update() explicitly justifies recording it) cargo test -p datafusion-datasource-parquet --lib: 143 passed, 0 failed (up from 133 passed, 10 failed).

Adds the third workload to the same-state comparison table: TPC-H-lat smoke: r10 23373 ms, exp3 23721 ms (-348 ms / 1.5%). Across TPC-DS-lat, ClickBench-lat, and TPC-H-lat smokes the r10 architecturally-correct stack is at parity with exp3 (within 0.6-1.5%, with directional wins on two of three workloads). The round-6 architecture work is done.

Confirms the memory's success criterion on both pushdown-relevant no-latency workloads, same machine state, solo sequential runs: TPC-DS no-lat: r10 16683 vs main 16971 (1.7% faster) ClickBench no-lat: r10 18304 vs main 22811 (19.7% faster)

- Add design.md as an upstream-ready proposal-style spec for the six-commit pr/round6-stack (problem, goals/non-goals, mechanism, alternatives, validation, migration, open questions). - slides/datafusion-meetup-05-2026/make_plots.py now reads the R6-STACK-pushdown[-lat] result dirs (the clean-stack branch's bench output) and labels the bars 'main / main + pushdown / change' for clarity. - Regenerate the four chart PNGs with the new framing and numbers. TPC-H SSD chart in particular flips visually: the change column now sits below 'main' instead of above 'main + pushdown'. - Rewrite the four content slides to match: ClickBench / TPC-DS / TPC-H SSD all show the change beating both 'main' and 'main + pushdown'; TPC-H S3 now reads 'parity with main, 0.46× of main + pushdown'; the closing slide replaces the deferred 'latency-aware z' bullet (which is now in the stack) with 'pushdown=on by default' as the next milestone. - Regenerate presentation.html via marp-cli. - Extend report.md §10 with the clean-stack listing, the new three-column 'main / main + pushdown / change' bench tables, and a §10.3 explaining the literal_columns() bugfix the workspace test suite uncovered.

…ommunity Strip PR #11 / round-6 / stacked-branch chrome from design.md and report.md; frame everything as a proposed change on top of apache/datafusion@main. Refresh report.md per-query tables with current data from MAIN-{no,}pushdown vs R6-STACK-pushdown so the drill-downs match the headline numbers (TPC-H flips from regression to win). Tighten the slide speaker notes accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-balance The previous framing claimed the TPC-H win came from demoting filters to post-scan to let a `FilterExec`-above-`RepartitionExec` shuffle re-balance partition skew. That's wrong on two counts: 1. The post-scan filter is applied inside the parquet opener (`apply_post_scan_filters_with_stats`), not as a separate `FilterExec` operator. The two paths are per-partition and equivalent in cost. 2. There is no shuffle between the filter and the scan; cross- partition skew on single-row-group files is a different problem, addressed by apache#21766. Reframe TPC-H's 0.89× as coming entirely from correct row-level placement on filters that benefit (Q18 dynamic filter is ~46 of the 89 ms total; Q1/Q3/Q19 contribute smaller page-skipping wins). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Slide 6's chart was pointed at Q9 (the old worst-loss query) but the takeaway leads with Q18 as the headline dynamic-filter win. Swap the TPC-H no-lat chart to Q18 (111 → 65 ms = 0.59× of main) and trim the takeaway prose so it fits within the slide bounds. TPC-H S3 chart still uses Q9 — there it tells the 2.58× pushdown-on- main regression story that the change neutralises. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When a PruningPredicate is built via try_new_tagged_conjuncts the wrapper's predicate_expr is a literal-true placeholder, so the wrapper's literal_guarantees is empty. literal_columns() was reading literal_guarantees only — it returned an empty vec for tagged predicates, and downstream consumers (notably ParquetOpener::open which uses literal_columns to decide which bloom filters to fetch) saw 'no columns of interest' and skipped the bloom-filter pruning altogether. Fix is to union each leaf sub-predicate's literal_columns into the wrapper's result, deduplicating, then merge with whatever the wrapper itself reports. Plain non-tagged predicates are unchanged. This makes parquet bloom-filter pruning fire correctly for adaptive-scheduler files (per-conjunct rates require tagged predicates, so before this fix every adaptive scan effectively had bloom filters disabled). Visible in: cargo test -p datafusion --test parquet_integration: pre-fix: 186 passed, 14 failed post-fix: 200 passed, 0 failed cargo test --workspace --no-fail-fast: pre-fix: 9236 passed, 19 failed post-fix: 9240 passed, 0 failed The remaining 5 of the 19 failures (predicate_cache_* and single_file*) are addressed in the next commit.

…pushdown=on' contract Several legacy tests assumed 'pushdown_filters=true' meant 'every filter runs at row-level'. Under the adaptive selectivity tracker that's only true once enough bytes-saved-per-sec evidence accumulates; the tracker's default (min_bytes_per_sec = INFINITY) keeps every filter at post-scan in short, deterministic test queries so the row-level path / predicate cache / row-group pruning these tests assert on never runs. Set filter_pushdown_min_bytes_per_sec = 0.0 alongside pushdown_filters = true in the affected setup paths so the tracker promotes every filter immediately and the legacy contract holds: - core/src/test_util/parquet.rs::ParquetScanOptions::config — central test helper used by single_file* and friends. - core/tests/parquet/mod.rs ContextWithParquet::new RowGroup branch — covers all row_group_pruning::prune_* and test_bloom_filter_* tests. - core/tests/parquet/filter_pushdown.rs predicate_cache_* tests that build SessionConfig directly. The predicate_cache_pushdown_disable test is left alone — it asserts 0 records *because* the cache is disabled, not because of the placement default. cargo test --workspace --no-fail-fast: 9240 passed, 0 failed.

…eholder) The slide 5 Q64 takeaway and several design/report passages claimed that `main + pushdown` regresses because it "evaluates dynamic filters at row-level before the build side finishes." This is wrong: `HashJoinExec` blocks the probe-side stream on `collect_build_side` (stream.rs:505) until the build is fully done; the dynamic filter is updated in shared_bounds.rs:581/687 *before* the finalizer notifies waiters. By the time any probe-side `ParquetScan` opens a file, the dynamic filter is always populated — there is no placeholder-eval window. What's actually happening on Q64: the populated dynamic filters (`key BETWEEN min AND max` from build-side bounds) just aren't selective enough on this data to recoup row-level cost (per-batch ArrowPredicate, extra I/O for filter cols not in projection, repeated re-evaluation across the many self-joins of `store_sales`). `main + pushdown` runs them row-level regardless; the change calls `fresh_rate_for_dynamic_conjunct` to re-evaluate each populated dynamic filter's pruning rate against the file's row-group stats and keeps the unselective ones post-scan. Also reframe §3.3/§4.4 in design.md and §3.3 in report.md: the reason `fresh_rate_for_dynamic_conjunct` exists is that the static `try_new_tagged_conjuncts` path can't introspect *through* the `DynamicFilterPhysicalExpr` wrapper without an explicit snapshot — not that the side-effect rates are "stale because they were taken when the filter was a placeholder". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…the hash_lookup case Prior framing said the static `try_new_tagged_conjuncts` path can't introspect through `DynamicFilterPhysicalExpr` / `OptionalFilterPhysicalExpr` without an explicit snapshot. Looking at the code, this is wrong: - `PruningPredicate::try_new` calls `snapshot_physical_expr_opt` unconditionally (pruning_predicate.rs:512). - `snapshot_physical_expr_opt` is a `transform_up` that calls `.snapshot()` on every node — no-op for static expressions, and both wrappers implement `snapshot()` to unwrap themselves. So for the common bounds shape `col >= lo AND col <= hi`, the static side-effect path captures a useful per-conjunct rate without any help. The real reason `fresh_rate_for_dynamic_conjunct` exists is narrower: it handles the `col >= lo AND col <= hi AND hash_lookup(...)` shape that some hash-join dynamic filters publish — `hash_lookup` is unhandled by `build_predicate_expression`, which flattens the whole AND toward always-true, so the static path skips the conjunct and the per-conjunct rate map has no entry for it. The refresh's partial-AND fallback snapshots the inner expression, splits it, and evaluates each prunable sub-part separately, returning the max as a promote-only signal. The whole-conjunct first try in `fresh_rate_for_dynamic_conjunct` is essentially redundant with what the static path already does; kept as a defensive cheap check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…proposal Concrete examples make the lottery framing land harder than the mode-comparison table alone: - ClickBench Q23 (URL LIKE '%google%'): main 3 612 ms → main+pd 121 ms — 30× speedup - ClickBench Q11 (filter col overlaps projection): essentially a tie - ClickBench Q21 (mandatory unselective filter): main 859 ms → main+pd 1 669 ms — 1.9× slowdown - TPC-DS Q64 (chained hash-join dynamic filters): main 471 ms → main+pd 20 010 ms — 42× slowdown Same workload (ClickBench has both Q23 and Q21), opposite outcomes — the point being that the user can't reason about the filter / projection / plan-shape interaction per query. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…+ loss) Replace the 'lottery in numbers' table with two slides, each showing the actual SQL of one ClickBench query: - Slide 3 'When pushdown wins big': Q23 (SELECT * ... WHERE URL LIKE '%google%' ... LIMIT 10), main 3 612 ms → +pd 121 ms, 30× faster. - Slide 4 'When pushdown loses big': Q30 (... WHERE SearchPhrase <> '' GROUP BY ...), main 276 ms → +pd 547 ms, 1.98× slower. Both queries hit the same ClickBench dataset — same flag, opposite outcomes. Speaker notes give the mechanism: row-level eval + sparse RowSelection → page-skipping (win) vs mandatory unselective filter with no row-skipping payoff and an extra column read (loss). Renumber slide-comment headers to keep them sequential (1..10). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alamb · 2026-05-13T17:16:53Z

Perhaps if we had an API such as described here, this would be easier to implement

feat(parquet): row-group and row-range sampling on ParquetSource #22024 (review)

The decoder's projection mask is now built from (user projection ∪ initial post-scan filter columns) and rebuilt at any row-group boundary where the optimal mask cols change — e.g. a filter promoting out of post-scan, or a dynamic placeholder waking up and being placed post-scan. arrow-rs's `StrategySwap::with_projection` installs the new mask before the next row group is read; we rebuild `stream_schema`, `projector`, and the post-scan filter rebase to match. The file-open and mid-stream code paths now share a single `build_decoder_projection_state` helper so the (read_plan, stream_schema, projector, rebased post-scan) chain stays in sync. Smoke bench (TPC-DS/TPC-H/ClickBench, `--simulate-latency`): sum-of-medians 1.6% faster vs r6 baseline. Notable per-query wins on filter-only-heavy workloads under latency: ClickBench Q42 -27.5%, Q23 -10.4%; TPC-H Q8/Q9 -8%. Two queries (TPC-DS Q26, TPC-H Q18) looked like regressions in a 3-round bench but cleared as noise on a 5-round rerun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-13T18:42:15Z

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details

     Cloning apache/main
    Building datafusion v53.1.0 (current)
error: running cargo-doc on crate 'datafusion' failed with output:
-----
   Compiling proc-macro2 v1.0.106
   Compiling unicode-ident v1.0.24
   Compiling quote v1.0.45
   Compiling libc v0.2.186
    Checking cfg-if v1.0.4
   Compiling shlex v1.3.0
   Compiling autocfg v1.5.0
   Compiling find-msvc-tools v0.1.9
   Compiling syn v2.0.117
   Compiling libm v0.2.16
   Compiling num-traits v0.2.19
    Checking memchr v2.8.0
   Compiling jobserver v0.1.34
   Compiling cc v1.2.62
   Compiling version_check v0.9.5
    Checking bytes v1.11.1
   Compiling zerocopy v0.8.48
   Compiling serde_core v1.0.228
    Checking once_cell v1.21.4
    Checking itoa v1.0.18
   Compiling getrandom v0.3.4
   Compiling zmij v1.0.21
    Checking num-integer v0.1.46
   Compiling serde v1.0.228
   Compiling serde_json v1.0.149
    Checking num-bigint v0.4.6
    Checking siphasher v1.0.3
    Checking equivalent v1.0.2
    Checking foldhash v0.2.0
   Compiling pkg-config v0.3.33
    Checking allocator-api2 v0.2.21
    Checking iana-time-zone v0.1.65
    Checking chrono v0.4.44
   Compiling synstructure v0.13.2
    Checking hashbrown v0.17.1
    Checking phf_shared v0.12.1
   Compiling ahash v0.8.12
   Compiling chrono-tz v0.10.4
    Checking stable_deref_trait v1.2.1
    Checking phf v0.12.1
    Checking num-complex v0.4.6
   Compiling zstd-sys v2.0.16+zstd.1.5.7
    Checking pin-project-lite v0.2.17
    Checking writeable v0.6.3
    Checking futures-core v0.3.32
    Checking futures-sink v0.3.32
   Compiling object v0.37.3
    Checking lexical-util v1.0.7
   Compiling zerocopy-derive v0.8.48
   Compiling zerofrom-derive v0.1.7
   Compiling serde_derive v1.0.228
    Checking zerofrom v0.1.8
   Compiling yoke-derive v0.8.2
    Checking arrow-schema v58.3.0
    Checking yoke v0.8.2
    Checking half v2.7.1
   Compiling zerovec-derive v0.11.3
    Checking arrow-buffer v58.3.0
   Compiling displaydoc v0.2.5
    Checking arrow-data v58.3.0
   Compiling zstd-safe v7.2.4
    Checking litemap v0.8.2
    Checking zerovec v0.11.6
    Checking smallvec v1.15.1
    Checking arrow-array v58.3.0
    Checking zerotrie v0.2.4
   Compiling icu_properties_data v2.2.0
    Checking tinystr v0.8.3
    Checking icu_locale_core v2.2.0
    Checking potential_utf v0.1.5
    Checking utf8_iter v1.0.4
   Compiling icu_normalizer_data v2.2.0
    Checking icu_collections v2.2.0
    Checking icu_provider v2.2.0
   Compiling tokio-macros v2.7.0
    Checking arrow-select v58.3.0
   Compiling semver v1.0.28
   Compiling crc32fast v1.5.0
   Compiling rustc_version v0.4.1
    Checking tokio v1.52.3
    Checking lexical-parse-integer v1.0.6
    Checking lexical-write-integer v1.0.6
   Compiling futures-macro v0.3.32
    Checking futures-channel v0.3.32
    Checking futures-task v0.3.32
    Checking slab v0.4.12
    Checking bitflags v2.11.1
    Checking simd-adler32 v0.3.9
    Checking futures-io v0.3.32
   Compiling parking_lot_core v0.9.12
    Checking adler2 v2.0.1
    Checking miniz_oxide v0.8.9
   Compiling ar_archive_writer v0.5.1
    Checking futures-util v0.3.32
    Checking lexical-write-float v1.0.6
    Checking lexical-parse-float v1.0.6
   Compiling psm v0.1.31
    Checking icu_normalizer v2.2.0
    Checking icu_properties v2.2.0
   Compiling flatbuffers v25.12.19
    Checking aho-corasick v1.1.4
    Checking regex-syntax v0.8.10
    Checking scopeguard v1.2.0
   Compiling getrandom v0.4.2
    Checking unicode-width v0.2.2
    Checking unicode-segmentation v1.13.2
    Checking ryu v1.0.23
    Checking zlib-rs v0.6.3
    Checking base64 v0.22.1
    Checking comfy-table v7.2.2
    Checking regex-automata v0.4.14
    Checking lock_api v0.4.14
    Checking idna_adapter v1.2.2
    Checking flate2 v1.1.9
    Checking lexical-core v1.0.6
    Checking arrow-ord v58.3.0
    Checking indexmap v2.14.0
    Checking zstd v0.13.3
   Compiling stacker v0.1.24
    Checking atoi v2.0.0
   Compiling thiserror v2.0.18
    Checking alloc-no-stdlib v2.0.4
   Compiling snap v1.1.1
    Checking twox-hash v2.1.2
    Checking percent-encoding v2.3.2
    Checking lz4_flex v0.13.1
    Checking form_urlencoded v1.2.2
    Checking alloc-stdlib v0.2.2
    Checking arrow-cast v58.3.0
    Checking regex v1.12.3
    Checking idna v1.1.0
    Checking futures-executor v0.3.32
   Compiling tracing-attributes v0.1.31
   Compiling thiserror-impl v2.0.18
   Compiling ring v0.17.14
    Checking tracing-core v0.1.36
    Checking csv-core v0.1.13
   Compiling paste v1.0.15
    Checking simdutf8 v0.1.5
    Checking same-file v1.0.6
    Checking either v1.15.0
    Checking walkdir v2.5.0
    Checking csv v1.4.0
    Checking itertools v0.14.0
    Checking tracing v0.1.44
    Checking futures v0.3.32
    Checking url v2.5.8
    Checking arrow-ipc v58.3.0
    Checking brotli-decompressor v5.0.0
    Checking parking_lot v0.12.5
   Compiling async-trait v0.1.89
   Compiling recursive-proc-macro-impl v0.1.1
    Checking http v1.4.0
    Checking ordered-float v2.10.1
    Checking getrandom v0.2.17
    Checking humantime v2.3.0
    Checking log v0.4.29
    Checking integer-encoding v3.0.4
    Checking byteorder v1.5.0
    Checking untrusted v0.9.0
    Checking recursive v0.1.1
    Checking brotli v8.0.2
    Checking thrift v0.17.0
    Checking object_store v0.13.2
    Checking arrow-csv v58.3.0
    Checking arrow-json v58.3.0
    Checking uuid v1.23.1
    Checking arrow-string v58.3.0
    Checking arrow-arith v58.3.0
    Checking arrow-row v58.3.0
   Compiling sqlparser_derive v0.5.0
   Compiling seq-macro v0.3.6
    Checking arrow v58.3.0
    Checking typenum v1.20.0
    Checking hex v0.4.3
   Compiling pin-project-internal v1.1.13
    Checking sqlparser v0.62.0
   Compiling generic-array v0.14.7
    Checking datafusion-doc v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/doc)
    Checking hybrid-array v0.4.12
    Checking pin-project v1.1.13
    Checking ppv-lite86 v0.2.21
    Checking rand_core v0.9.5
    Checking foldhash v0.1.5
   Compiling crossbeam-utils v0.8.21
   Compiling rustix v1.1.4
    Checking hashbrown v0.15.5
    Checking parquet v58.3.0
    Checking rand_chacha v0.9.0
    Checking linux-raw-sys v0.12.1
    Checking fixedbitset v0.5.7
    Checking petgraph v0.8.3
    Checking rand v0.9.4
    Checking crypto-common v0.2.1
    Checking block-buffer v0.12.0
    Checking hashbrown v0.14.5
    Checking fastrand v2.4.1
    Checking const-oid v0.10.2
    Checking digest v0.11.3
   Compiling datafusion-macros v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/macros)
    Checking dashmap v6.1.0
    Checking tempfile v3.27.0
    Checking block-buffer v0.10.4
    Checking crypto-common v0.1.7
   Compiling blake3 v1.8.5
    Checking subtle v2.6.1
    Checking cpufeatures v0.3.0
    Checking digest v0.10.7
    Checking arrayref v0.3.9
    Checking arrayvec v0.7.6
    Checking constant_time_eq v0.4.2
    Checking blake2 v0.10.6
    Checking sha2 v0.11.0
    Checking md-5 v0.11.0
   Compiling liblzma-sys v0.4.6
    Checking libbz2-rs-sys v0.2.4
    Checking bzip2 v0.6.1
    Checking datafusion-common-runtime v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/common-runtime)
    Checking compression-core v0.4.32
    Checking tokio-util v0.7.18
    Checking glob v0.3.3
   Compiling bigdecimal v0.4.10
    Checking crc-catalog v2.5.0
   Compiling heck v0.5.0
    Checking crc v3.4.0
   Compiling strum_macros v0.28.0
    Checking tokio-stream v0.1.18
    Checking liblzma v0.4.6
    Checking compression-codecs v0.4.38
    Checking arrow-avro v58.3.0
    Checking async-compression v0.4.42
    Checking datafusion-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/common)
    Checking datafusion-expr-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/expr-common)
    Checking datafusion-physical-expr-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr-common)
    Checking datafusion-functions-window-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-window-common)
    Checking datafusion-functions-aggregate-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-aggregate-common)
    Checking datafusion-expr v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/expr)
    Checking datafusion-physical-expr v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr)
    Checking datafusion-execution v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/execution)
    Checking datafusion-functions v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions)
    Checking datafusion-functions-aggregate v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-aggregate)
    Checking datafusion-optimizer v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/optimizer)
    Checking datafusion-functions-window v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-window)
    Checking datafusion-physical-plan v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-plan)
    Checking datafusion-physical-expr-adapter v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr-adapter)
    Checking datafusion-functions-nested v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-nested)
    Checking datafusion-sql v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/sql)
    Checking datafusion-session v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/session)
    Checking datafusion-datasource v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource)
    Checking datafusion-catalog v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/catalog)
    Checking datafusion-pruning v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/pruning)
    Checking datafusion-datasource-json v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource-json)
    Checking datafusion-datasource-csv v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource-csv)
    Checking datafusion-physical-optimizer v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-optimizer)
    Checking datafusion-datasource-parquet v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource-parquet)
    Checking datafusion-datasource-avro v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource-avro)
error[E0432]: unresolved import `parquet::arrow::push_decoder::StrategySwap`
  --> /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/opener.rs:76:52
   |
76 |     ParquetPushDecoder, ParquetPushDecoderBuilder, StrategySwap,
   |                                                    ^^^^^^^^^^^^ no `StrategySwap` in `arrow::push_decoder`

    Checking datafusion-datasource-arrow v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource-arrow)
error[E0599]: no method named `can_swap_strategy` found for struct `ParquetPushDecoder` in the current scope
    --> /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/opener.rs:1823:26
     |
1823 |         if !self.decoder.can_swap_strategy() || self.all_conjuncts.is_empty() {
     |                          ^^^^^^^^^^^^^^^^^ method not found in `ParquetPushDecoder`

error[E0599]: no method named `swap_strategy` found for struct `ParquetPushDecoder` in the current scope
    --> /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/opener.rs:1929:14
     |
1928 | /         self.decoder
1929 | |             .swap_strategy(swap)
     | |             -^^^^^^^^^^^^^ method not found in `ParquetPushDecoder`
     | |_____________|
     |

Some errors have detailed explanations: E0432, E0599.
For more information about an error, try `rustc --explain E0432`.
error: could not compile `datafusion-datasource-parquet` (lib) due to 3 previous errors
warning: build failed, waiting for other jobs to finish...

-----

error: failed to build rustdoc for crate datafusion v53.1.0
note: this is usually due to a compilation error in the crate,
      and is unlikely to be a bug in cargo-semver-checks
note: the following command can be used to reproduce the error:
      cargo new --lib example &&
          cd example &&
          echo '[workspace]' >> Cargo.toml &&
          cargo add --path /home/runner/work/datafusion/datafusion/datafusion/core --features array_expressions,avro,backtrace,bzip2,compression,crypto_expressions,datafusion-datasource-avro,datafusion-datasource-parquet,datafusion-functions-nested,datafusion-sql,datetime_expressions,default,encoding_expressions,extended_tests,flate2,force_hash_collisions,liblzma,math_expressions,nested_expressions,parquet,parquet_encryption,recursive_protection,regex_expressions,serde,sql,sqlparser,string_expressions,unicode_expressions,zstd &&
          cargo check &&
          cargo doc

    Building datafusion-common v53.1.0 (current)
       Built [  35.606s] (current)
     Parsing datafusion-common v53.1.0 (current)
      Parsed [   0.060s] (current)
    Building datafusion-common v53.1.0 (baseline)
       Built [  34.933s] (baseline)
     Parsing datafusion-common v53.1.0 (baseline)
      Parsed [   0.057s] (baseline)
    Checking datafusion-common v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.622s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field ParquetOptions.filter_pushdown_min_bytes_per_sec in /home/runner/work/datafusion/datafusion/datafusion/common/src/config.rs:843
  field ParquetOptions.filter_collecting_byte_ratio_threshold in /home/runner/work/datafusion/datafusion/datafusion/common/src/config.rs:843
  field ParquetOptions.filter_confidence_z in /home/runner/work/datafusion/datafusion/datafusion/common/src/config.rs:843

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  72.477s] datafusion-common
    Building datafusion-datasource-parquet v53.1.0 (current)
error: running cargo-doc on crate 'datafusion-datasource-parquet' failed with output:
-----
   Compiling proc-macro2 v1.0.106
   Compiling quote v1.0.45
   Compiling unicode-ident v1.0.24
   Compiling libc v0.2.186
    Checking cfg-if v1.0.4
   Compiling libm v0.2.16
   Compiling autocfg v1.5.0
   Compiling num-traits v0.2.19
   Compiling syn v2.0.117
    Checking memchr v2.8.0
   Compiling zerocopy v0.8.48
   Compiling serde_core v1.0.228
    Checking bytes v1.11.1
    Checking itoa v1.0.18
   Compiling getrandom v0.3.4
    Checking once_cell v1.21.4
   Compiling zmij v1.0.21
   Compiling serde_json v1.0.149
    Checking num-integer v0.1.46
    Checking allocator-api2 v0.2.21
    Checking iana-time-zone v0.1.65
    Checking foldhash v0.2.0
    Checking siphasher v1.0.3
    Checking stable_deref_trait v1.2.1
   Compiling version_check v0.9.5
    Checking equivalent v1.0.2
    Checking hashbrown v0.17.1
   Compiling ahash v0.8.12
    Checking phf_shared v0.12.1
    Checking chrono v0.4.44
    Checking num-bigint v0.4.6
   Compiling chrono-tz v0.10.4
    Checking phf v0.12.1
   Compiling find-msvc-tools v0.1.9
   Compiling shlex v1.3.0
    Checking arrow-schema v58.3.0
   Compiling synstructure v0.13.2
   Compiling jobserver v0.1.34
   Compiling cc v1.2.62
    Checking num-complex v0.4.6
    Checking writeable v0.6.3
    Checking lexical-util v1.0.7
    Checking smallvec v1.15.1
    Checking litemap v0.8.2
   Compiling zerocopy-derive v0.8.48
   Compiling zerofrom-derive v0.1.7
   Compiling yoke-derive v0.8.2
    Checking zerofrom v0.1.8
   Compiling zerovec-derive v0.11.3
    Checking yoke v0.8.2
   Compiling displaydoc v0.2.5
   Compiling pkg-config v0.3.33
    Checking zerovec v0.11.6
   Compiling zstd-sys v2.0.16+zstd.1.5.7
    Checking tinystr v0.8.3
    Checking potential_utf v0.1.5
    Checking zerotrie v0.2.4
    Checking icu_locale_core v2.2.0
    Checking pin-project-lite v0.2.17
   Compiling icu_properties_data v2.2.0
    Checking utf8_iter v1.0.4
   Compiling icu_normalizer_data v2.2.0
    Checking icu_collections v2.2.0
    Checking icu_provider v2.2.0
   Compiling semver v1.0.28
    Checking futures-sink v0.3.32
    Checking futures-core v0.3.32
    Checking futures-channel v0.3.32
   Compiling rustc_version v0.4.1
   Compiling futures-macro v0.3.32
    Checking lexical-parse-integer v1.0.6
    Checking lexical-write-integer v1.0.6
    Checking slab v0.4.12
    Checking futures-task v0.3.32
   Compiling parking_lot_core v0.9.12
    Checking bitflags v2.11.1
    Checking futures-io v0.3.32
   Compiling zstd-safe v7.2.4
    Checking futures-util v0.3.32
    Checking lexical-write-float v1.0.6
    Checking lexical-parse-float v1.0.6
    Checking half v2.7.1
    Checking arrow-buffer v58.3.0
   Compiling flatbuffers v25.12.19
    Checking icu_normalizer v2.2.0
    Checking icu_properties v2.2.0
    Checking arrow-data v58.3.0
    Checking arrow-array v58.3.0
    Checking aho-corasick v1.1.4
    Checking scopeguard v1.2.0
    Checking regex-syntax v0.8.10
    Checking base64 v0.22.1
    Checking unicode-segmentation v1.13.2
    Checking ryu v1.0.23
    Checking unicode-width v0.2.2
    Checking comfy-table v7.2.2
    Checking lock_api v0.4.14
    Checking arrow-select v58.3.0
    Checking idna_adapter v1.2.2
    Checking regex-automata v0.4.14
    Checking lexical-core v1.0.6
   Compiling tokio-macros v2.7.0
    Checking arrow-ord v58.3.0
    Checking atoi v2.0.0
    Checking twox-hash v2.1.2
    Checking alloc-no-stdlib v2.0.4
   Compiling thiserror v2.0.18
   Compiling getrandom v0.4.2
    Checking percent-encoding v2.3.2
    Checking form_urlencoded v1.2.2
    Checking arrow-cast v58.3.0
    Checking alloc-stdlib v0.2.2
    Checking lz4_flex v0.13.1
    Checking tokio v1.52.3
    Checking regex v1.12.3
    Checking idna v1.1.0
    Checking futures-executor v0.3.32
   Compiling ring v0.17.14
   Compiling thiserror-impl v2.0.18
   Compiling tracing-attributes v0.1.31
    Checking indexmap v2.14.0
    Checking tracing-core v0.1.36
    Checking csv-core v0.1.13
   Compiling paste v1.0.15
    Checking either v1.15.0
    Checking same-file v1.0.6
   Compiling snap v1.1.1
    Checking simdutf8 v0.1.5
    Checking itertools v0.14.0
    Checking tracing v0.1.44
    Checking walkdir v2.5.0
    Checking csv v1.4.0
    Checking futures v0.3.32
    Checking url v2.5.8
    Checking brotli-decompressor v5.0.0
    Checking parking_lot v0.12.5
   Compiling async-trait v0.1.89
    Checking http v1.4.0
    Checking ordered-float v2.10.1
    Checking getrandom v0.2.17
    Checking integer-encoding v3.0.4
    Checking untrusted v0.9.0
    Checking byteorder v1.5.0
    Checking humantime v2.3.0
    Checking zlib-rs v0.6.3
    Checking thrift v0.17.0
    Checking object_store v0.13.2
    Checking flate2 v1.1.9
    Checking zstd v0.13.3
    Checking arrow-ipc v58.3.0
    Checking brotli v8.0.2
    Checking arrow-csv v58.3.0
    Checking arrow-json v58.3.0
    Checking arrow-string v58.3.0
    Checking arrow-arith v58.3.0
    Checking arrow-row v58.3.0
    Checking log v0.4.29
   Compiling seq-macro v0.3.6
    Checking arrow v58.3.0
    Checking uuid v1.23.1
    Checking hex v0.4.3
   Compiling pin-project-internal v1.1.13
   Compiling crossbeam-utils v0.8.21
   Compiling rustix v1.1.4
    Checking ppv-lite86 v0.2.21
    Checking parquet v58.3.0
    Checking rand_core v0.9.5
    Checking linux-raw-sys v0.12.1
    Checking datafusion-doc v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/doc)
    Checking rand_chacha v0.9.0
    Checking pin-project v1.1.13
    Checking hashbrown v0.14.5
    Checking foldhash v0.1.5
    Checking fastrand v2.4.1
    Checking hashbrown v0.15.5
    Checking dashmap v6.1.0
    Checking rand v0.9.4
    Checking fixedbitset v0.5.7
   Compiling datafusion-macros v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/macros)
    Checking tempfile v3.27.0
    Checking petgraph v0.8.3
    Checking datafusion-common-runtime v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/common-runtime)
    Checking glob v0.3.3
    Checking datafusion-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/common)
    Checking datafusion-expr-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/expr-common)
    Checking datafusion-physical-expr-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr-common)
    Checking datafusion-functions-aggregate-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-aggregate-common)
    Checking datafusion-functions-window-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-window-common)
    Checking datafusion-expr v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/expr)
    Checking datafusion-execution v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/execution)
    Checking datafusion-physical-expr v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr)
    Checking datafusion-functions v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions)
    Checking datafusion-physical-plan v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-plan)
    Checking datafusion-physical-expr-adapter v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr-adapter)
    Checking datafusion-session v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/session)
    Checking datafusion-datasource v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource)
    Checking datafusion-pruning v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/pruning)
 Documenting datafusion-datasource-parquet v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource-parquet)
error[E0432]: unresolved import `parquet::arrow::push_decoder::StrategySwap`
  --> /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/opener.rs:76:52
   |
76 |     ParquetPushDecoder, ParquetPushDecoderBuilder, StrategySwap,
   |                                                    ^^^^^^^^^^^^ no `StrategySwap` in `arrow::push_decoder`

For more information about this error, try `rustc --explain E0432`.
error: could not document `datafusion-datasource-parquet`

-----

error: failed to build rustdoc for crate datafusion-datasource-parquet v53.1.0
note: this is usually due to a compilation error in the crate,
      and is unlikely to be a bug in cargo-semver-checks
note: the following command can be used to reproduce the error:
      cargo new --lib example &&
          cd example &&
          echo '[workspace]' >> Cargo.toml &&
          cargo add --path /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet --features parquet_encryption &&
          cargo check &&
          cargo doc

    Building datafusion-physical-expr-common v53.1.0 (current)
       Built [  20.251s] (current)
     Parsing datafusion-physical-expr-common v53.1.0 (current)
      Parsed [   0.019s] (current)
    Building datafusion-physical-expr-common v53.1.0 (baseline)
       Built [  20.616s] (baseline)
     Parsing datafusion-physical-expr-common v53.1.0 (baseline)
      Parsed [   0.020s] (baseline)
    Checking datafusion-physical-expr-common v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.186s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  41.920s] datafusion-physical-expr-common
    Building datafusion-proto v53.1.0 (current)
error: running cargo-doc on crate 'datafusion-proto' failed with output:
-----
   Compiling proc-macro2 v1.0.106
   Compiling quote v1.0.45
   Compiling unicode-ident v1.0.24
   Compiling libc v0.2.186
    Checking cfg-if v1.0.4
   Compiling autocfg v1.5.0
   Compiling libm v0.2.16
   Compiling num-traits v0.2.19
    Checking memchr v2.8.0
    Checking bytes v1.11.1
   Compiling syn v2.0.117
   Compiling zerocopy v0.8.48
   Compiling serde_core v1.0.228
    Checking once_cell v1.21.4
    Checking itoa v1.0.18
   Compiling getrandom v0.3.4
   Compiling zmij v1.0.21
   Compiling serde_json v1.0.149
    Checking num-integer v0.1.46
   Compiling version_check v0.9.5
    Checking foldhash v0.2.0
    Checking allocator-api2 v0.2.21
    Checking siphasher v1.0.3
    Checking iana-time-zone v0.1.65
    Checking equivalent v1.0.2
    Checking hashbrown v0.17.1
   Compiling jobserver v0.1.34
    Checking chrono v0.4.44
    Checking phf_shared v0.12.1
   Compiling ahash v0.8.12
    Checking num-bigint v0.4.6
   Compiling find-msvc-tools v0.1.9
   Compiling shlex v1.3.0
   Compiling chrono-tz v0.10.4
    Checking stable_deref_trait v1.2.1
   Compiling cc v1.2.62
    Checking arrow-schema v58.3.0
   Compiling synstructure v0.13.2
    Checking phf v0.12.1
   Compiling pkg-config v0.3.33
    Checking num-complex v0.4.6
    Checking pin-project-lite v0.2.17
   Compiling zstd-sys v2.0.16+zstd.1.5.7
    Checking smallvec v1.15.1
    Checking futures-sink v0.3.32
    Checking litemap v0.8.2
    Checking lexical-util v1.0.7
    Checking futures-core v0.3.32
    Checking writeable v0.6.3
   Compiling zerocopy-derive v0.8.48
   Compiling zerofrom-derive v0.1.7
   Compiling yoke-derive v0.8.2
   Compiling zerovec-derive v0.11.3
    Checking zerofrom v0.1.8
    Checking yoke v0.8.2
   Compiling displaydoc v0.2.5
    Checking zerotrie v0.2.4
    Checking zerovec v0.11.6
    Checking tinystr v0.8.3
    Checking icu_locale_core v2.2.0
    Checking potential_utf v0.1.5
   Compiling icu_normalizer_data v2.2.0
    Checking utf8_iter v1.0.4
   Compiling icu_properties_data v2.2.0
    Checking icu_collections v2.2.0
    Checking icu_provider v2.2.0
   Compiling zstd-safe v7.2.4
   Compiling semver v1.0.28
    Checking lexical-parse-integer v1.0.6
   Compiling rustc_version v0.4.1
    Checking lexical-write-integer v1.0.6
   Compiling tokio-macros v2.7.0
   Compiling futures-macro v0.3.32
    Checking futures-channel v0.3.32
    Checking slab v0.4.12
    Checking half v2.7.1
    Checking futures-io v0.3.32
    Checking futures-task v0.3.32
    Checking base64 v0.22.1
    Checking bitflags v2.11.1
   Compiling parking_lot_core v0.9.12
    Checking arrow-buffer v58.3.0
    Checking futures-util v0.3.32
    Checking tokio v1.52.3
    Checking arrow-data v58.3.0
    Checking arrow-array v58.3.0
    Checking lexical-write-float v1.0.6
   Compiling flatbuffers v25.12.19
    Checking lexical-parse-float v1.0.6
    Checking icu_properties v2.2.0
    Checking arrow-select v58.3.0
    Checking icu_normalizer v2.2.0
    Checking aho-corasick v1.1.4
    Checking ryu v1.0.23
   Compiling crc32fast v1.5.0
    Checking scopeguard v1.2.0
    Checking regex-syntax v0.8.10
    Checking unicode-segmentation v1.13.2
    Checking unicode-width v0.2.2
   Compiling getrandom v0.4.2
    Checking comfy-table v7.2.2
    Checking lock_api v0.4.14
    Checking arrow-ord v58.3.0
    Checking idna_adapter v1.2.2
    Checking lexical-core v1.0.6
    Checking indexmap v2.14.0
    Checking atoi v2.0.0
    Checking adler2 v2.0.1
   Compiling snap v1.1.1
   Compiling thiserror v2.0.18
    Checking regex-automata v0.4.14
    Checking simd-adler32 v0.3.9
    Checking alloc-no-stdlib v2.0.4
    Checking percent-encoding v2.3.2
    Checking twox-hash v2.1.2
    Checking form_urlencoded v1.2.2
    Checking lz4_flex v0.13.1
    Checking miniz_oxide v0.8.9
    Checking alloc-stdlib v0.2.2
    Checking arrow-cast v58.3.0
    Checking idna v1.1.0
    Checking futures-executor v0.3.32
   Compiling tracing-attributes v0.1.31
    Checking regex v1.12.3
   Compiling thiserror-impl v2.0.18
    Checking tracing-core v0.1.36
    Checking csv-core v0.1.13
    Checking either v1.15.0
   Compiling paste v1.0.15
    Checking simdutf8 v0.1.5
    Checking zlib-rs v0.6.3
    Checking same-file v1.0.6
    Checking walkdir v2.5.0
    Checking tracing v0.1.44
    Checking itertools v0.14.0
    Checking csv v1.4.0
    Checking futures v0.3.32
    Checking url v2.5.8
    Checking flate2 v1.1.9
    Checking brotli-decompressor v5.0.0
    Checking parking_lot v0.12.5
   Compiling async-trait v0.1.89
    Checking ordered-float v2.10.1
    Checking zstd v0.13.3
    Checking http v1.4.0
    Checking arrow-ipc v58.3.0
    Checking humantime v2.3.0
    Checking integer-encoding v3.0.4
    Checking byteorder v1.5.0
    Checking brotli v8.0.2
    Checking thrift v0.17.0
    Checking object_store v0.13.2
    Checking arrow-csv v58.3.0
    Checking arrow-json v58.3.0
    Checking arrow-string v58.3.0
    Checking uuid v1.23.1
    Checking arrow-row v58.3.0
    Checking arrow-arith v58.3.0
    Checking log v0.4.29
   Compiling seq-macro v0.3.6
   Compiling pin-project-internal v1.1.13
    Checking arrow v58.3.0
    Checking ppv-lite86 v0.2.21
    Checking rand_core v0.9.5
   Compiling rustix v1.1.4
   Compiling crossbeam-utils v0.8.21
    Checking rand_chacha v0.9.0
    Checking linux-raw-sys v0.12.1
    Checking datafusion-doc v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/doc)
    Checking parquet v58.3.0
    Checking rand v0.9.4
    Checking pin-project v1.1.13
    Checking hashbrown v0.14.5
    Checking foldhash v0.1.5
    Checking fastrand v2.4.1
    Checking hashbrown v0.15.5
    Checking dashmap v6.1.0
    Checking fixedbitset v0.5.7
    Checking petgraph v0.8.3
    Checking tempfile v3.27.0
   Compiling datafusion-macros v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/macros)
    Checking hex v0.4.3
    Checking datafusion-common-runtime v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/common-runtime)
    Checking glob v0.3.3
   Compiling serde v1.0.228
   Compiling serde_derive v1.0.228
   Compiling liblzma-sys v0.4.6
   Compiling anyhow v1.0.102
    Checking crc-catalog v2.5.0
   Compiling heck v0.5.0
    Checking libbz2-rs-sys v0.2.4
    Checking datafusion-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/common)
    Checking bzip2 v0.6.1
   Compiling strum_macros v0.28.0
    Checking crc v3.4.0
   Compiling prost-derive v0.14.3
    Checking tokio-util v0.7.18
    Checking tokio-stream v0.1.18
    Checking prost v0.14.3
    Checking pbjson v0.9.0
    Checking datafusion-expr-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/expr-common)
    Checking datafusion-proto-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/proto-common)
    Checking datafusion-physical-expr-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr-common)
    Checking datafusion-functions-aggregate-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-aggregate-common)
    Checking datafusion-functions-window-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-window-common)
    Checking liblzma v0.4.6
    Checking arrow-avro v58.3.0
    Checking datafusion-expr v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/expr)
    Checking datafusion-execution v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/execution)
    Checking datafusion-physical-expr v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr)
    Checking datafusion-functions v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions)
    Checking datafusion-physical-plan v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-plan)
    Checking datafusion-physical-expr-adapter v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr-adapter)
    Checking datafusion-session v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/session)
    Checking datafusion-datasource v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource)
    Checking datafusion-catalog v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/catalog)
    Checking datafusion-pruning v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/pruning)
    Checking datafusion-datasource-avro v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource-avro)
    Checking datafusion-datasource-json v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource-json)
    Checking datafusion-datasource-arrow v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource-arrow)
    Checking datafusion-datasource-parquet v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource-parquet)
error[E0432]: unresolved import `parquet::arrow::push_decoder::StrategySwap`
  --> /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/opener.rs:76:52
   |
76 |     ParquetPushDecoder, ParquetPushDecoderBuilder, StrategySwap,
   |                                                    ^^^^^^^^^^^^ no `StrategySwap` in `arrow::push_decoder`

    Checking datafusion-datasource-csv v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource-csv)
error[E0599]: no method named `can_swap_strategy` found for struct `ParquetPushDecoder` in the current scope
    --> /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/opener.rs:1823:26
     |
1823 |         if !self.decoder.can_swap_strategy() || self.all_conjuncts.is_empty() {
     |                          ^^^^^^^^^^^^^^^^^ method not found in `ParquetPushDecoder`

error[E0599]: no method named `swap_strategy` found for struct `ParquetPushDecoder` in the current scope
    --> /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/opener.rs:1929:14
     |
1928 | /         self.decoder
1929 | |             .swap_strategy(swap)
     | |             -^^^^^^^^^^^^^ method not found in `ParquetPushDecoder`
     | |_____________|
     |

Some errors have detailed explanations: E0432, E0599.
For more information about an error, try `rustc --explain E0432`.
error: could not compile `datafusion-datasource-parquet` (lib) due to 3 previous errors
warning: build failed, waiting for other jobs to finish...

-----

error: failed to build rustdoc for crate datafusion-proto v53.1.0
note: this is usually due to a compilation error in the crate,
      and is unlikely to be a bug in cargo-semver-checks
note: the following command can be used to reproduce the error:
      cargo new --lib example &&
          cd example &&
          echo '[workspace]' >> Cargo.toml &&
          cargo add --path /home/runner/work/datafusion/datafusion/datafusion/proto --features avro,datafusion-datasource-avro,datafusion-datasource-parquet,default,json,parquet,pbjson,serde,serde_json &&
          cargo check &&
          cargo doc

    Building datafusion-proto-common v53.1.0 (current)
       Built [  21.682s] (current)
     Parsing datafusion-proto-common v53.1.0 (current)
      Parsed [   0.046s] (current)
    Building datafusion-proto-common v53.1.0 (baseline)
       Built [  21.606s] (baseline)
     Parsing datafusion-proto-common v53.1.0 (baseline)
      Parsed [   0.048s] (baseline)
    Checking datafusion-proto-common v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   1.010s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  45.315s] datafusion-proto-common
    Building datafusion-pruning v53.1.0 (current)
       Built [  38.237s] (current)
     Parsing datafusion-pruning v53.1.0 (current)
      Parsed [   0.013s] (current)
    Building datafusion-pruning v53.1.0 (baseline)
       Built [  37.389s] (baseline)
     Parsing datafusion-pruning v53.1.0 (baseline)
      Parsed [   0.013s] (baseline)
    Checking datafusion-pruning v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.072s] 222 checks: 222 pass, 30 skip
     Summary no semver update required
    Finished [  76.825s] datafusion-pruning
    Building datafusion-sqllogictest v53.1.0 (current)
error: running cargo-doc on crate 'datafusion-sqllogictest' failed with output:
-----
   Compiling proc-macro2 v1.0.106
   Compiling quote v1.0.45
   Compiling unicode-ident v1.0.24
   Compiling libc v0.2.186
    Checking cfg-if v1.0.4
    Checking bytes v1.11.1
    Checking memchr v2.8.0
   Compiling serde_core v1.0.228
   Compiling syn v2.0.117
   Compiling autocfg v1.5.0
   Compiling jobserver v0.1.34
    Checking itoa v1.0.18
   Compiling shlex v1.3.0
   Compiling find-msvc-tools v0.1.9
   Compiling libm v0.2.16
   Compiling cc v1.2.62
   Compiling num-traits v0.2.19
    Checking foldhash v0.2.0
    Checking equivalent v1.0.2
    Checking allocator-api2 v0.2.21
    Checking once_cell v1.21.4
    Checking hashbrown v0.17.1
    Checking pin-project-lite v0.2.17
   Compiling zmij v1.0.21
    Checking indexmap v2.14.0
    Checking futures-core v0.3.32
   Compiling zerocopy v0.8.48
    Checking futures-sink v0.3.32
    Checking errno v0.3.14
    Checking mio v1.2.0
    Checking signal-hook-registry v1.4.8
    Checking socket2 v0.6.3
   Compiling version_check v0.9.5
   Compiling serde v1.0.228
    Checking num-integer v0.1.46
    Checking slab v0.4.12
   Compiling serde_json v1.0.149
    Checking futures-channel v0.3.32
   Compiling getrandom v0.3.4
    Checking smallvec v1.15.1
    Checking num-bigint v0.4.6
   Compiling synstructure v0.13.2
    Checking http v1.4.0
    Checking futures-task v0.3.32
    Checking base64 v0.22.1
    Checking futures-io v0.3.32
    Checking iana-time-zone v0.1.65
    Checking tracing-core v0.1.36
    Checking chrono v0.4.44
   Compiling zerocopy-derive v0.8.48
   Compiling tokio-macros v2.7.0
    Checking tokio v1.52.3
   Compiling serde_derive v1.0.228
   Compiling futures-macro v0.3.32
    Checking futures-util v0.3.32
   Compiling zerofrom-derive v0.1.7
   Compiling tracing-attributes v0.1.31
    Checking siphasher v1.0.3
    Checking zerofrom v0.1.8
   Compiling yoke-derive v0.8.2
    Checking stable_deref_trait v1.2.1
    Checking num-complex v0.4.6
   Compiling zerovec-derive v0.11.3
    Checking tracing v0.1.44
    Checking getrandom v0.2.17
   Compiling pkg-config v0.3.33
    Checking phf_shared v0.12.1
    Checking half v2.7.1
   Compiling displaydoc v0.2.5
   Compiling ahash v0.8.12
   Compiling thiserror v2.0.18
    Checking yoke v0.8.2
   Compiling chrono-tz v0.10.4
    Checking zerovec v0.11.6
    Checking percent-encoding v2.3.2
    Checking arrow-buffer v58.3.0
    Checking phf v0.12.1
   Compiling thiserror-impl v2.0.18
    Checking arrow-schema v58.3.0
   Compiling ring v0.17.14
    Checking arrow-data v58.3.0
   Compiling semver v1.0.28
    Checking rand_core v0.10.1
   Compiling getrandom v0.4.2
    Checking log v0.4.29
    Checking tinystr v0.8.3
    Checking untrusted v0.9.0
    Checking writeable v0.6.3
    Checking litemap v0.8.2
    Checking bitflags v2.11.1
    Checking icu_locale_core v2.2.0
    Checking potential_utf v0.1.5
    Checking zerotrie v0.2.4
   Compiling zstd-sys v2.0.16+zstd.1.5.7
   Compiling async-trait v0.1.89
   Compiling icu_properties_data v2.2.0
   Compiling icu_normalizer_data v2.2.0
    Checking utf8_iter v1.0.4
    Checking icu_collections v2.2.0
    Checking icu_provider v2.2.0
    Checking aho-corasick v1.1.4
   Compiling object v0.37.3
   Compiling zstd-safe v7.2.4
    Checking lexical-util v1.0.7
    Checking regex-syntax v0.8.10
    Checking arrow-array v58.3.0
    Checking ryu v1.0.23
    Checking regex-automata v0.4.14
    Checking icu_normalizer v2.2.0
    Checking icu_properties v2.2.0
    Checking arrow-select v58.3.0
   Compiling rustix v1.1.4
    Checking regex v1.12.3
    Checking idna_adapter v1.2.2
    Checking form_urlencoded v1.2.2
   Compiling parking_lot_core v0.9.12
    Checking typenum v1.20.0
   Compiling crc32fast v1.5.0
    Checking either v1.15.0
    Checking unicode-width v0.2.2
    Checking idna v1.1.0
    Checking lexical-parse-integer v1.0.6
    Checking lexical-write-integer v1.0.6
   Compiling rustc_version v0.4.1
    Checking futures-executor v0.3.32
   Compiling pin-project-internal v1.1.13
    Checking simd-adler32 v0.3.9
    Checking adler2 v2.0.1
    Checking scopeguard v1.2.0
    Checking lock_api v0.4.14
    Checking miniz_oxide v0.8.9
    Checking futures v0.3.32
   Compiling flatbuffers v25.12.19
    Checking lexical-write-float v1.0.6
    Checking pin-project v1.1.13
    Checking lexical-parse-float v1.0.6
    Checking url v2.5.8
    Checking byteorder v1.5.0
    Checking unicode-segmentation v1.13.2
    Checking zlib-rs v0.6.3
    Checking comfy-table v7.2.2
    Checking lexical-core v1.0.6
   Compiling ar_archive_writer v0.5.1
    Checking itertools v0.14.0
    Checking flate2 v1.1.9
   Compiling psm v0.1.31
    Checking arrow-ord v58.3.0
    Checking atoi v2.0.0
   Compiling stacker v0.1.24
    Checking hex v0.4.3
   Compiling snap v1.1.1
    Checking twox-hash v2.1.2
    Checking alloc-no-stdlib v2.0.4
    Checking lz4_flex v0.13.1
    Checking alloc-stdlib v0.2.2
    Checking arrow-cast v58.3.0
    Checking parking_lot v0.12.5
    Checking csv-core v0.1.13
   Compiling paste v1.0.15
    Checking humantime v2.3.0
    Checking same-file v1.0.6
    Checking simdutf8 v0.1.5
    Checking walkdir v2.5.0
    Checking csv v1.4.0
    Checking brotli-decompressor v5.0.0
   Compiling recursive-proc-macro-impl v0.1.1
    Checking ordered-float v2.10.1
    Checking subtle v2.6.1
    Checking integer-encoding v3.0.4
    Checking recursive v0.1.1
    Checking arrow-json v58.3.0
    Checking brotli v8.0.2
    Checking thrift v0.17.0
    Checking arrow-csv v58.3.0
    Checking object_store v0.13.2
    Checking arrow-string v58.3.0
    Checking arrow-row v58.3.0
    Checking arrow-arith v58.3.0
    Checking zstd v0.13.3
    Checking uuid v1.23.1
    Checking arrow-ipc v58.3.0
    Checking tokio-util v0.7.18
   Compiling sqlparser_derive v0.5.0
   Compiling seq-macro v0.3.6
    Checking ppv-lite86 v0.2.21
    Checking cpufeatures v0.3.0
    Checking hybrid-array v0.4.12
    Checking sqlparser v0.62.0
    Checking linux-raw-sys v0.12.1
    Checking arrow v58.3.0
    Checking parquet v58.3.0
    Checking cmov v0.5.3
    Checking block-buffer v0.12.0
    Checking crypto-common v0.2.1
    Checking ctutils v0.4.2
    Checking rand_core v0.9.5
   Compiling generic-array v0.14.7
    Checking const-oid v0.10.2
    Checking digest v0.11.3
    Checking rand_chacha v0.9.0
    Checking rand v0.9.4
    Checking datafusion-doc v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/doc)
   Compiling crossbeam-utils v0.8.21
    Checking foldhash v0.1.5
    Checking hashbrown v0.15.5
    Checking crypto-common v0.1.7
    Checking block-buffer v0.10.4
    Checking fixedbitset v0.5.7
    Checking fastrand v2.4.1
   Compiling heck v0.5.0
    Checking tempfile v3.27.0
    Checking petgraph v0.8.3
    Checking digest v0.10.7
    Checking md-5 v0.11.0
    Checking sha2 v0.11.0
    Checking hashbrown v0.14.5
    Checking dashmap v6.1.0
   Compiling datafusion-macros v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/macros)
   Compiling blake3 v1.8.5
   Compiling anyhow v1.0.102
    Checking constant_time_eq v0.4.2
    Checking arrayvec v0.7.6
    Checking arrayref v0.3.9
    Checking blake2 v0.10.6
   Compiling liblzma-sys v0.4.6
    Checking libbz2-rs-sys v0.2.4
    Checking bzip2 v0.6.1
    Checking datafusion-common-runtime v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/common-runtime)
   Compiling httparse v1.10.1
    Checking compression-core v0.4.32
    Checking http-body v1.0.1
   Compiling prost-derive v0.14.3
    Checking glob v0.3.3
    Checking tower-service v0.3.3
    Checking try-lock v0.2.5
    Checking atomic-waker v1.1.2
    Checking fnv v1.0.7
    Checking h2 v0.4.14
    Checking want v0.3.1
    Checking tokio-stream v0.1.18
    Checking httpdate v1.0.3
    Checking zeroize v1.8.2
    Checking rustls-pki-types v1.14.1
    Checking hyper v1.9.0
    Checking datafusion-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/common)
   Compiling prost v0.14.3
    Checking hyper-util v0.1.20
    Checking http-body-util v0.1.3
   Compiling prettyplease v0.2.37
    Checking tower-layer v0.3.3
   Compiling rustls v0.23.40
    Checking sync_wrapper v1.0.2
    Checking liblzma v0.4.6
   Compiling prost-types v0.14.3
    Checking compression-codecs v0.4.38
    Checking async-compression v0.4.42
    Checking datafusion-expr-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/expr-common)
    Checking rustls-webpki v0.103.13
   Compiling serde_derive_internals v0.29.1
    Checking datafusion-physical-expr-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr-common)
   Compiling schemars v0.8.22
    Checking mime v0.3.17
   Compiling hashbrown v0.16.1
    Checking axum-core v0.5.6
   Compiling schemars_derive v0.8.22
    Checking datafusion-functions-aggregate-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-aggregate-common)
    Checking datafusion-functions-window-common v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-window-common)
    Checking datafusion-expr v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/expr)
    Checking tower v0.5.3
    Checking matchit v0.8.4
   Compiling dyn-clone v1.0.20
   Compiling multimap v0.10.1
    Checking axum v0.8.9
   Compiling prost-build v0.14.3
   Compiling regress v0.10.5
    Checking datafusion-physical-expr v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr)
    Checking datafusion-execution v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/execution)
    Checking datafusion-functions v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions)
   Compiling pbjson-build v0.8.0
    Checking hyper-timeout v0.5.2
    Checking chacha20 v0.10.0
   Compiling strsim v0.11.1
   Compiling portable-atomic v1.13.1
   Compiling ident_case v1.0.1
   Compiling darling_core v0.23.0
    Checking rand v0.10.1
   Compiling typify-impl v0.5.0
    Checking tonic v0.14.6
   Compiling serde_tokenstream v0.2.3
    Checking ureq-proto v0.6.0
    Checking datafusion-physical-plan v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-plan)
    Checking datafusion-physical-expr-adapter v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-expr-adapter)
   Compiling bigdecimal v0.4.10
    Checking powerfmt v0.2.0
    Checking utf8parse v0.2.2
    Checking utf8-zero v0.8.1
    Checking crc-catalog v2.5.0
    Checking tinyvec_macros v0.1.1
   Compiling bollard-buildkit-proto v0.7.0
   Compiling darling_macro v0.23.0
    Checking tinyvec v1.11.0
    Checking crc v3.4.0
    Checking ureq v3.3.0
    Checking anstyle-parse v1.0.0
    Checking deranged v0.5.8
    Checking tonic-prost v0.14.6
    Checking datafusion-functions-aggregate v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-aggregate)
   Compiling strum_macros v0.28.0
   Compiling typify-macro v0.5.0
   Compiling structmeta-derive v0.3.0
    Checking is_terminal_polyfill v1.70.2
   Compiling unsafe-libyaml v0.2.11
    Checking time-core v0.1.8
    Checking anstyle-query v1.1.5
    Checking colorchoice v1.0.5
    Checking anstyle v1.0.14
    Checking num-conv v0.2.1
    Checking anstream v1.0.0
    Checking time v0.3.47
   Compiling structmeta v0.3.0
   Compiling serde_yaml v0.9.34+deprecated
    Checking arrow-avro v58.3.0
    Checking datafusion-functions-nested v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/functions-nested)
    Checking datafusion-session v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/session)
    Checking datafusion-datasource v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource)
   Compiling typify v0.5.0
    Checking unicode-normalization v0.1.25
   Compiling darling v0.23.0
   Compiling pbjson-types v0.8.0
    Checking datafusion-pruning v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/pruning)
    Checking datafusion-catalog v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/catalog)
    Checking tokio-rustls v0.26.4
   Compiling async-stream-impl v0.3.6
   Compiling serde_repr v0.1.20
    Checking num-rational v0.4.2
    Checking num-iter v0.1.45
    Checking unicode-bidi v0.3.18
    Checking unicode-properties v0.1.4
    Checking clap_lex v1.1.0
    Checking openssl-probe v0.2.1
    Checking stringprep v0.1.5
    Checking rustls-native-certs v0.8.3
    Checking clap_builder v4.6.0
    Checking bollard-stubs v1.52.1-rc.29.1.3
    Checking num v0.4.3
    Checking async-stream v0.3.6
    Checking hyper-rustls v0.27.9
    Checking datafusion-datasource-parquet v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/datasource-parquet)
error[E0432]: unresolved import `parquet::arrow::push_decoder::StrategySwap`
  --> /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/opener.rs:76:52
   |
76 |     ParquetPushDecoder, ParquetPushDecoderBuilder, StrategySwap,
   |                                                    ^^^^^^^^^^^^ no `StrategySwap` in `arrow::push_decoder`

error[E0599]: no method named `can_swap_strategy` found for struct `ParquetPushDecoder` in the current scope
    --> /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/opener.rs:1823:26
     |
1823 |         if !self.decoder.can_swap_strategy() || self.all_conjuncts.is_empty() {
     |                          ^^^^^^^^^^^^^^^^^ method not found in `ParquetPushDecoder`

error[E0599]: no method named `swap_strategy` found for struct `ParquetPushDecoder` in the current scope
    --> /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/opener.rs:1929:14
     |
1928 | /         self.decoder
1929 | |             .swap_strategy(swap)
     | |             -^^^^^^^^^^^^^ method not found in `ParquetPushDecoder`
     | |_____________|
     |

    Checking datafusion-physical-optimizer v53.1.0 (/home/runner/work/datafusion/datafusion/datafusion/physical-optimizer)
Some errors have detailed explanations: E0432, E0599.
For more information about an error, try `rustc --explain E0432`.
error: could not compile `datafusion-datasource-parquet` (lib) due to 3 previous errors
warning: build failed, waiting for other jobs to finish...

-----

error: failed to build rustdoc for crate datafusion-sqllogictest v53.1.0
note: this is usually due to a compilation error in the crate,
      and is unlikely to be a bug in cargo-semver-checks
note: the following command can be used to reproduce the error:
      cargo new --lib example &&
          cd example &&
          echo '[workspace]' >> Cargo.toml &&
          cargo add --path /home/runner/work/datafusion/datafusion/datafusion/sqllogictest --features avro,backtrace,bytes,chrono,datafusion-substrait,parquet_encryption,postgres,postgres-types,substrait,testcontainers-modules,tokio-postgres &&
          cargo check &&
          cargo doc

error: aborting due to failure to build rustdoc for crate datafusion v53.1.0

alamb mentioned this pull request May 13, 2026

feat(parquet): row-group and row-range sampling on ParquetSource #22024

Open

adriangb and others added 22 commits May 13, 2026 12:40

chore: bump Cargo.lock to pydantic/arrow-rs adaptive-strategy-swap@b6…

149a71a

…33dd62 Picks up the rustdoc fix from the arrow-rs companion branch so the DataFusion CI doc job resolves clean too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Revert "feat(parquet): coalesce post-scan-filtered batches"

ae43632

This reverts commit d146ebe.

docs: experiments.md final writeup of overnight exp1-4

d61033c

docs: round 2 — push all-combined further (mid-stream swap + post-sca…

3af9ae7

…n filter)

adriangb and others added 23 commits May 13, 2026 12:41

cleanup: drop unused top-level PruningPredicate import in selectivity

2b1a78b

After r5+r6+ consolidation, the top-level use is unused; the helper imports it locally to keep the module surface clean.

docs: log selectivity test fix as part of round-6 PR-readiness

d2d5db0

update

e45d194

update

e4f5708

remove

893f836

cleanup

d118b2c

cleanup

858013c

adriangb force-pushed the exp/r6-pruningpredicate-rates branch from 42c3a8b to 858013c Compare May 13, 2026 16:55

fmt

54433ca

github-actions Bot added the auto detected api change Auto detected API change label May 13, 2026

adriangb mentioned this pull request May 13, 2026

refactor(parquet): split opener.rs into module + add ParquetAccessPlanOptimizer trait #22156

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experiment] Adaptive filter pushdown#22144

[Experiment] Adaptive filter pushdown#22144
adriangb wants to merge 58 commits into
apache:mainfrom
adriangb:exp/r6-pruningpredicate-rates

adriangb commented May 13, 2026

Uh oh!

alamb commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adriangb commented May 13, 2026

Uh oh!

alamb commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 13, 2026 •

edited

Loading