Initial impl for sort pushdown in DataFusion FileSource implementation#8235
Conversation
b730f27 to
ed243f0
Compare
Polar Signals Profiling ResultsLatest Run
Previous Runs (1)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.972x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (0.972x ➖, 0↑ 0↓)
No file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.995x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.003x ➖, 0↑ 0↓)
datafusion / parquet (1.007x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.024x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.022x ➖, 0↑ 0↓)
duckdb / parquet (1.019x ➖, 0↑ 0↓)
File Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.853x ✅, 17↑ 0↓)
datafusion / vortex-compact (0.824x ✅, 21↑ 0↓)
datafusion / parquet (0.917x ➖, 11↑ 0↓)
datafusion / arrow (0.807x ✅, 16↑ 0↓)
duckdb / vortex-file-compressed (0.921x ➖, 7↑ 0↓)
duckdb / vortex-compact (0.939x ➖, 5↑ 0↓)
duckdb / parquet (0.962x ➖, 5↑ 0↓)
duckdb / duckdb (0.987x ➖, 2↑ 0↓)
File Size Changes (10 files changed, -0.2% overall, 4↑ 6↓)
Totals:
|
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | compare[56] |
224.4 µs | 327.1 µs | -31.39% |
| ❌ | Simulation | compare[62] |
249.6 µs | 363.4 µs | -31.33% |
| ❌ | Simulation | compare[63] |
255 µs | 370.8 µs | -31.22% |
| ❌ | Simulation | compare[60] |
243.1 µs | 353.3 µs | -31.19% |
| ❌ | Simulation | compare[61] |
250.3 µs | 362.3 µs | -30.92% |
| ❌ | Simulation | compare[58] |
240.5 µs | 346.8 µs | -30.67% |
| ❌ | Simulation | compare[59] |
245.7 µs | 353.9 µs | -30.58% |
| ❌ | Simulation | compare[57] |
240.8 µs | 345.4 µs | -30.29% |
| ❌ | Simulation | compare[54] |
231 µs | 329.9 µs | -29.99% |
| ❌ | Simulation | compare[55] |
236 µs | 336.7 µs | -29.93% |
| ❌ | Simulation | compare[48] |
207.1 µs | 294.9 µs | -29.77% |
| ❌ | Simulation | compare[52] |
224.7 µs | 320 µs | -29.76% |
| ❌ | Simulation | compare[53] |
231.8 µs | 328.9 µs | -29.53% |
| ❌ | Simulation | compare[50] |
221.8 µs | 313.2 µs | -29.19% |
| ❌ | Simulation | compare[51] |
226.8 µs | 320.1 µs | -29.16% |
| ❌ | Simulation | compare[49] |
222.5 µs | 312.1 µs | -28.72% |
| ❌ | Simulation | compare[47] |
217.6 µs | 303.4 µs | -28.29% |
| ❌ | Simulation | compare[46] |
212.9 µs | 296.8 µs | -28.28% |
| ❌ | Simulation | compare[44] |
206.3 µs | 286.6 µs | -28.01% |
| ❌ | Simulation | compare[45] |
213 µs | 295.2 µs | -27.85% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing adamg/pushdown-sort-df (5b3f04a) with develop (21a2dbc)
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.003x ➖, 2↑ 4↓)
datafusion / vortex-compact (1.019x ➖, 0↑ 7↓)
datafusion / parquet (1.027x ➖, 0↑ 4↓)
duckdb / vortex-file-compressed (1.014x ➖, 0↑ 2↓)
duckdb / vortex-compact (1.018x ➖, 0↑ 2↓)
duckdb / parquet (1.017x ➖, 0↑ 1↓)
duckdb / duckdb (1.017x ➖, 1↑ 2↓)
File Size Changes (7 files changed, +0.0% overall, 3↑ 4↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.076x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.895x ➖, 2↑ 0↓)
datafusion / parquet (1.097x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.001x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.991x ➖, 0↑ 0↓)
duckdb / parquet (0.981x ➖, 0↑ 0↓)
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (0.971x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.995x ➖, 0↑ 0↓)
duckdb / parquet (1.000x ➖, 0↑ 0↓)
File Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.920x ➖, 3↑ 0↓)
datafusion / vortex-compact (0.916x ➖, 6↑ 0↓)
datafusion / parquet (0.936x ➖, 0↑ 0↓)
datafusion / arrow (0.922x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (0.937x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.945x ➖, 0↑ 0↓)
duckdb / parquet (0.958x ➖, 1↑ 0↓)
duckdb / duckdb (0.967x ➖, 0↑ 0↓)
File Size Changes (26 files changed, +0.0% overall, 13↑ 13↓)
Totals:
|
Benchmarks: Appian on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.957x ➖, 0↑ 0↓)
datafusion / parquet (1.001x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.915x ➖, 3↑ 0↓)
duckdb / parquet (0.906x ➖, 3↑ 0↓)
duckdb / duckdb (0.995x ➖, 0↑ 0↓)
File Size Changes (4 files changed, -0.0% overall, 0↑ 4↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.065x ➖, 0↑ 4↓)
datafusion / vortex-compact (1.125x ➖, 0↑ 6↓)
datafusion / parquet (1.070x ➖, 2↑ 3↓)
duckdb / vortex-file-compressed (0.992x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.021x ➖, 0↑ 0↓)
duckdb / parquet (1.064x ➖, 0↑ 1↓)
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.982x ➖, 4↑ 0↓)
datafusion / parquet (1.006x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (0.985x ➖, 4↑ 2↓)
duckdb / parquet (1.005x ➖, 0↑ 0↓)
duckdb / duckdb (1.009x ➖, 0↑ 0↓)
File Size Changes (103 files changed, -0.0% overall, 46↑ 57↓)
Totals:
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.201x ➖, 0↑ 2↓)
datafusion / vortex-compact (1.143x ➖, 0↑ 4↓)
datafusion / parquet (1.104x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.988x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.974x ➖, 0↑ 0↓)
duckdb / parquet (1.026x ➖, 0↑ 1↓)
|
ed243f0 to
babfac8
Compare
|
going to add tests later today |
|
Do we have any benchmarks that care about sortness |
| if !is_descending { | ||
| let mut this = self.clone(); | ||
| this.ordered = true; | ||
| return Ok(SortOrderPushdownResult::Inexact { |
There was a problem hiding this comment.
is this a bit optimistic? I don't know how datafusion sort operators treat this but would we want to fallback to a near sorted optimised sort strategy always if we have an ascending sort of a column that exists in the file?
There was a problem hiding this comment.
ordered here is just "the order of the file", instead of returning batches in whichever order we get them.
There was a problem hiding this comment.
ah I mean returning Inexact instead of Unknown here because we only know that the column exists at this point, and not much about how it is sorted
There was a problem hiding this comment.
you're right, I've simplified it for now.
|
@joseph-isaacs we do (some of the top-k style queries), but I'm not sure we have a benchmarks with a sorted table in the way DF can understand. |
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
0b5e60a to
5b3f04a
Compare
Summary
Naive implementation of
try_pushdown_sort, its mostly just parts of the parquet impl + making sure we propagate the information into the Vortex scan.