Implement compare/between for bitpacked arrays - improve performance by up to 2X#7279
Implement compare/between for bitpacked arrays - improve performance by up to 2X#7279
Conversation
Merging this PR will degrade performance by 10.6%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | new_alp_prim_test_between[f64, 32768] |
250.6 µs | 117.6 µs | ×2.1 |
| ⚡ | Simulation | new_bp_prim_test_between[i16, 16384] |
89.1 µs | 63.4 µs | +40.56% |
| ⚡ | Simulation | new_alp_prim_test_between[f64, 2048] |
71.6 µs | 54.8 µs | +30.57% |
| ⚡ | Simulation | new_bp_prim_test_between[i16, 32768] |
134.1 µs | 91.6 µs | +46.46% |
| ⚡ | Simulation | patched_take_10k_adversarial |
258.4 µs | 228.4 µs | +13.14% |
| ⚡ | Simulation | new_bp_prim_test_between[i16, 2048] |
51.9 µs | 39.8 µs | +30.53% |
| ⚡ | Simulation | new_bp_prim_test_between[i64, 2048] |
62.7 µs | 45 µs | +39.35% |
| ⚡ | Simulation | new_bp_prim_test_between[i32, 32768] |
168.6 µs | 97.1 µs | +73.67% |
| ⚡ | Simulation | new_bp_prim_test_between[i64, 16384] |
143.2 µs | 72.6 µs | +97.23% |
| ⚡ | Simulation | new_bp_prim_test_between[i64, 32768] |
236 µs | 101.8 µs | ×2.3 |
| ⚡ | Simulation | new_bp_prim_test_between[i32, 2048] |
53.5 µs | 40 µs | +33.95% |
| ⚡ | Simulation | take_10k_dispersed |
284.4 µs | 239.6 µs | +18.67% |
| ⚡ | Simulation | take_10k_first_chunk_only |
270.4 µs | 225.8 µs | +19.75% |
| ⚡ | Simulation | new_bp_prim_test_between[i32, 16384] |
108.1 µs | 68 µs | +59.11% |
| ❌ | Simulation | patched_take_10k_contiguous_patches |
258.3 µs | 287.1 µs | -10.02% |
| ⚡ | Simulation | patched_take_10k_dispersed |
315.5 µs | 285.3 µs | +10.59% |
| ⚡ | Simulation | patched_take_10k_first_chunk_only |
301.9 µs | 271.8 µs | +11.07% |
| ⚡ | Simulation | new_alp_prim_test_between[f64, 16384] |
155.1 µs | 84.7 µs | +83.11% |
| ⚡ | Simulation | new_alp_prim_test_between[f32, 2048] |
60.8 µs | 50.3 µs | +20.89% |
| ⚡ | Simulation | new_alp_prim_test_between[f32, 32768] |
168.8 µs | 111.7 µs | +51.11% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Comparing adamg/bitpack-compare (dbfb7c2) with develop (ac8c751)
Footnotes
-
33 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
74732ef to
8cddde0
Compare
Polar Signals Profiling ResultsLatest Run
Previous Runs (9)
Powered by Polar Signals Cloud |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
a103738 to
4454bc6
Compare
52289bf to
dbe5a37
Compare
dbe5a37 to
33e6d30
Compare
| array.len(), | ||
| array.offset(), | ||
| |bit_width, packed_chunk, chunk_matches| unsafe { | ||
| U::unchecked_unpack_cmp(bit_width, packed_chunk, chunk_matches, compare, value); |
There was a problem hiding this comment.
is the crux of this that we fuse unpacking with comparison op and avoid the second allocation/scan?
There was a problem hiding this comment.
ye, you save a lot on mem bandwidth, esp. for wider types
There was a problem hiding this comment.
TIL that we had fused cmp in fastlanes crate this whole time
There was a problem hiding this comment.
We lacked a good way to manage fused exection, I think now we have a bit of a better mechanism
|
@joseph-isaacs do we have an accepted way of measuring code size? |
|
Main difference in build time is that While this isn't great for our local experience, I think it'll still parallelize fine with other heavy dependencies, and its pretty similar to what users will experience if they pull something like DF (which has multiple crates that takes 50-60 seconds to build) |
fc2ad24 to
083b4b4
Compare
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
083b4b4 to
695ce82
Compare
|
Diff size of and bin using this with and without the compare+bitpack |
After some iterations, this seems to improve performance in any case we actually measure. It does require merging and releasing spiraldb/fastlanes#125.