Skip to content

Conversation

@Dandandan
Copy link
Contributor

This commit optimizes the RepartitionExec operator by reusing the Vec allocations for indices in the BatchPartitioner. Instead of reallocating the indices vector for every RecordBatch it processes in hash partitioning mode, this change modifies the BatchPartitioner to reuse these allocations across batches.

This is achieved by:

  • Storing the indices vectors in the BatchPartitionerState.
  • Clearing the vectors for each new batch.
  • Using std::mem::take to move the data out for processing.
  • Reclaiming the underlying Vec from the Arrow array using into_parts after processing, clearing it, and placing it back into the state for the next iteration.

This avoids repeated allocations and improves performance, especially when dealing with many small batches.

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

This commit optimizes the `RepartitionExec` operator by reusing the `Vec` allocations for indices in the `BatchPartitioner`. Instead of reallocating the `indices` vector for every `RecordBatch` it processes in hash partitioning mode, this change modifies the `BatchPartitioner` to reuse these allocations across batches.

This is achieved by:
- Storing the `indices` vectors in the `BatchPartitionerState`.
- Clearing the vectors for each new batch.
- Using `std::mem::take` to move the data out for processing.
- Reclaiming the underlying `Vec` from the Arrow array using `into_parts` after processing, clearing it, and placing it back into the state for the next iteration.

This avoids repeated allocations and improves performance, especially when dealing with many small batches.
@Dandandan
Copy link
Contributor Author

Run benchmarks

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jan 12, 2026
@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing bolt-reuse-indices-allocations-11049574746024928120 (06569d0) to 418f62a diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and bolt-reuse-indices-allocations-11049574746024928120
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ bolt-reuse-indices-allocations-11049574746024928120 ┃    Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 0 │  2458.33 ms │                                          2464.54 ms │ no change │
│ QQuery 1 │   968.00 ms │                                           924.32 ms │ no change │
│ QQuery 2 │  1906.38 ms │                                          1869.89 ms │ no change │
│ QQuery 3 │  1182.82 ms │                                          1174.03 ms │ no change │
│ QQuery 4 │  2292.13 ms │                                          2258.40 ms │ no change │
│ QQuery 5 │ 28526.33 ms │                                         28648.57 ms │ no change │
│ QQuery 6 │  3833.92 ms │                                          3905.47 ms │ no change │
│ QQuery 7 │  3549.86 ms │                                          3589.89 ms │ no change │
└──────────┴─────────────┴─────────────────────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                  │ 44717.78ms │
│ Total Time (bolt-reuse-indices-allocations-11049574746024928120)   │ 44835.11ms │
│ Average Time (HEAD)                                                │  5589.72ms │
│ Average Time (bolt-reuse-indices-allocations-11049574746024928120) │  5604.39ms │
│ Queries Faster                                                     │          0 │
│ Queries Slower                                                     │          0 │
│ Queries with No Change                                             │          8 │
│ Queries with Failure                                               │          0 │
└────────────────────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ bolt-reuse-indices-allocations-11049574746024928120 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     1.46 ms │                                             1.42 ms │     no change │
│ QQuery 1  │    49.94 ms │                                            49.32 ms │     no change │
│ QQuery 2  │   137.03 ms │                                           133.20 ms │     no change │
│ QQuery 3  │   155.11 ms │                                           154.24 ms │     no change │
│ QQuery 4  │  1080.93 ms │                                          1083.02 ms │     no change │
│ QQuery 5  │  1367.35 ms │                                          1368.82 ms │     no change │
│ QQuery 6  │     1.49 ms │                                             1.42 ms │     no change │
│ QQuery 7  │    55.13 ms │                                            52.29 ms │ +1.05x faster │
│ QQuery 8  │  1472.98 ms │                                          1449.19 ms │     no change │
│ QQuery 9  │  1762.83 ms │                                          1813.42 ms │     no change │
│ QQuery 10 │   373.48 ms │                                           342.04 ms │ +1.09x faster │
│ QQuery 11 │   408.87 ms │                                           402.56 ms │     no change │
│ QQuery 12 │  1300.99 ms │                                          1283.68 ms │     no change │
│ QQuery 13 │  1928.66 ms │                                          2006.78 ms │     no change │
│ QQuery 14 │  1251.13 ms │                                          1258.94 ms │     no change │
│ QQuery 15 │  1241.67 ms │                                          1241.66 ms │     no change │
│ QQuery 16 │  2505.03 ms │                                          2518.58 ms │     no change │
│ QQuery 17 │  2517.61 ms │                                          2513.47 ms │     no change │
│ QQuery 18 │  5820.61 ms │                                          4802.85 ms │ +1.21x faster │
│ QQuery 19 │   123.35 ms │                                           120.26 ms │     no change │
│ QQuery 20 │  1950.59 ms │                                          1903.63 ms │     no change │
│ QQuery 21 │  2209.47 ms │                                          2215.67 ms │     no change │
│ QQuery 22 │  4218.27 ms │                                          3777.61 ms │ +1.12x faster │
│ QQuery 23 │ 15960.38 ms │                                         12188.95 ms │ +1.31x faster │
│ QQuery 24 │   199.14 ms │                                           204.28 ms │     no change │
│ QQuery 25 │   466.59 ms │                                           474.38 ms │     no change │
│ QQuery 26 │   219.29 ms │                                           212.47 ms │     no change │
│ QQuery 27 │  2800.06 ms │                                          2709.69 ms │     no change │
│ QQuery 28 │ 24997.24 ms │                                         23474.51 ms │ +1.06x faster │
│ QQuery 29 │   969.46 ms │                                           968.91 ms │     no change │
│ QQuery 30 │  1284.19 ms │                                          1301.05 ms │     no change │
│ QQuery 31 │  1358.18 ms │                                          1332.13 ms │     no change │
│ QQuery 32 │  5101.61 ms │                                          4475.31 ms │ +1.14x faster │
│ QQuery 33 │  5802.86 ms │                                          5271.37 ms │ +1.10x faster │
│ QQuery 34 │  5672.41 ms │                                          5548.88 ms │     no change │
│ QQuery 35 │  1924.34 ms │                                          1893.48 ms │     no change │
│ QQuery 36 │    65.39 ms │                                            66.04 ms │     no change │
│ QQuery 37 │    42.91 ms │                                            43.76 ms │     no change │
│ QQuery 38 │    63.71 ms │                                            65.65 ms │     no change │
│ QQuery 39 │   102.81 ms │                                           101.70 ms │     no change │
│ QQuery 40 │    25.99 ms │                                            26.55 ms │     no change │
│ QQuery 41 │    22.15 ms │                                            22.00 ms │     no change │
│ QQuery 42 │    18.48 ms │                                            18.58 ms │     no change │
└───────────┴─────────────┴─────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                  │ 99031.15ms │
│ Total Time (bolt-reuse-indices-allocations-11049574746024928120)   │ 90893.80ms │
│ Average Time (HEAD)                                                │  2303.05ms │
│ Average Time (bolt-reuse-indices-allocations-11049574746024928120) │  2113.81ms │
│ Queries Faster                                                     │          8 │
│ Queries Slower                                                     │          0 │
│ Queries with No Change                                             │         35 │
│ Queries with Failure                                               │          0 │
└────────────────────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ bolt-reuse-indices-allocations-11049574746024928120 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 117.16 ms │                                           111.92 ms │     no change │
│ QQuery 2  │  30.25 ms │                                            28.51 ms │ +1.06x faster │
│ QQuery 3  │  36.14 ms │                                            34.68 ms │     no change │
│ QQuery 4  │  29.98 ms │                                            28.35 ms │ +1.06x faster │
│ QQuery 5  │  89.64 ms │                                            83.00 ms │ +1.08x faster │
│ QQuery 6  │  19.98 ms │                                            19.89 ms │     no change │
│ QQuery 7  │ 235.69 ms │                                           230.52 ms │     no change │
│ QQuery 8  │  35.66 ms │                                            34.30 ms │     no change │
│ QQuery 9  │ 102.97 ms │                                           105.52 ms │     no change │
│ QQuery 10 │  62.10 ms │                                            61.62 ms │     no change │
│ QQuery 11 │  17.20 ms │                                            16.38 ms │     no change │
│ QQuery 12 │  49.65 ms │                                            50.28 ms │     no change │
│ QQuery 13 │  47.45 ms │                                            48.12 ms │     no change │
│ QQuery 14 │  13.88 ms │                                            13.45 ms │     no change │
│ QQuery 15 │  24.02 ms │                                            24.93 ms │     no change │
│ QQuery 16 │  24.03 ms │                                            23.51 ms │     no change │
│ QQuery 17 │ 151.65 ms │                                           149.23 ms │     no change │
│ QQuery 18 │ 281.23 ms │                                           277.38 ms │     no change │
│ QQuery 19 │  37.56 ms │                                            36.79 ms │     no change │
│ QQuery 20 │  50.90 ms │                                            50.78 ms │     no change │
│ QQuery 21 │ 317.96 ms │                                           321.98 ms │     no change │
│ QQuery 22 │  18.46 ms │                                            18.02 ms │     no change │
└───────────┴───────────┴─────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                                  │ 1793.54ms │
│ Total Time (bolt-reuse-indices-allocations-11049574746024928120)   │ 1769.15ms │
│ Average Time (HEAD)                                                │   81.52ms │
│ Average Time (bolt-reuse-indices-allocations-11049574746024928120) │   80.42ms │
│ Queries Faster                                                     │         3 │
│ Queries Slower                                                     │         0 │
│ Queries with No Change                                             │        19 │
│ Queries with Failure                                               │         0 │
└────────────────────────────────────────────────────────────────────┴───────────┘

@Dandandan Dandandan changed the title [Test] Reuse indices buffer in RepartitionExec Jan 14, 2026
@Dandandan
Copy link
Contributor Author

Run benchmarks

@Dandandan
Copy link
Contributor Author

run benchmark tpch tpcds

1 similar comment
@Dandandan
Copy link
Contributor Author

run benchmark tpch tpcds

@Dandandan
Copy link
Contributor Author

run benchmark tpch tpcds

@Dandandan Dandandan changed the title Reuse indices buffer in RepartitionExec [Minor] Reuse indices buffer in RepartitionExec Jan 14, 2026
@Dandandan Dandandan marked this pull request as ready for review January 14, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants