Skip to content

feat: turboquant encoding for vectors#7167

Draft
lwwmanning wants to merge 23 commits intodevelopfrom
claude/admiring-lichterman
Draft

feat: turboquant encoding for vectors#7167
lwwmanning wants to merge 23 commits intodevelopfrom
claude/admiring-lichterman

Conversation

@lwwmanning
Copy link
Contributor

@lwwmanning lwwmanning commented Mar 25, 2026

Summary

Lossy quantization for vector data (e.g., embeddings) based on TurboQuant

Closes: #000

Testing

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 25, 2026

Merging this PR will not alter performance

✅ 1106 untouched benchmarks
🆕 14 new benchmarks
⏩ 1522 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
🆕 Simulation turboquant_compress_dim1024_4bit N/A 154.3 ms N/A
🆕 Simulation turboquant_decompress_dim1536_2bit N/A 231.8 ms N/A
🆕 Simulation turboquant_compress_dim1024_2bit N/A 147.9 ms N/A
🆕 Simulation turboquant_compress_dim128_4bit N/A 19.4 ms N/A
🆕 Simulation turboquant_compress_dim768_2bit N/A 146.5 ms N/A
🆕 Simulation turboquant_decompress_dim1024_2bit N/A 115.6 ms N/A
🆕 Simulation turboquant_compress_dim128_2bit N/A 18.6 ms N/A
🆕 Simulation turboquant_compress_dim1536_2bit N/A 297.3 ms N/A
🆕 Simulation turboquant_decompress_dim768_2bit N/A 115 ms N/A
🆕 Simulation turboquant_compress_dim1536_4bit N/A 310 ms N/A
🆕 Simulation turboquant_decompress_dim128_2bit N/A 14.5 ms N/A
🆕 Simulation turboquant_decompress_dim1024_4bit N/A 115.7 ms N/A
🆕 Simulation turboquant_decompress_dim1536_4bit N/A 232 ms N/A
🆕 Simulation turboquant_decompress_dim128_4bit N/A 14.5 ms N/A

Comparing claude/admiring-lichterman (c51db31) with develop (6fade3c)

Open in CodSpeed

Footnotes

  1. 1522 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@lwwmanning lwwmanning added the changelog/feature A new feature label Mar 26, 2026
Comment on lines +67 to +99
fn array_hash<H: std::hash::Hasher>(
array: &TurboQuantMSEArray,
state: &mut H,
precision: Precision,
) {
array.dtype.hash(state);
array.dimension.hash(state);
array.bit_width.hash(state);
array.padded_dim.hash(state);
array.rotation_seed.hash(state);
array.codes.array_hash(state, precision);
array.norms.array_hash(state, precision);
array.centroids.array_hash(state, precision);
array.rotation_signs.array_hash(state, precision);
}

fn array_eq(
array: &TurboQuantMSEArray,
other: &TurboQuantMSEArray,
precision: Precision,
) -> bool {
array.dtype == other.dtype
&& array.dimension == other.dimension
&& array.bit_width == other.bit_width
&& array.padded_dim == other.padded_dim
&& array.rotation_seed == other.rotation_seed
&& array.codes.array_eq(&other.codes, precision)
&& array.norms.array_eq(&other.norms, precision)
&& array.centroids.array_eq(&other.centroids, precision)
&& array
.rotation_signs
.array_eq(&other.rotation_signs, precision)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we really need a derive macro for this stuff

/// Maximum iterations for Max-Lloyd algorithm.
const MAX_ITERATIONS: usize = 200;

type CentroidCache = Mutex<HashMap<(u32, u8), Vec<f32>>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should really be a dashmap

type CentroidCache = Mutex<HashMap<(u32, u8), Vec<f32>>>;

/// Global centroid cache keyed by (dimension, bit_width).
static CENTROID_CACHE: OnceLock<CentroidCache> = OnceLock::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and lazylock probably

return 0.0;
}
base.powf(exponent)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think avoiding the branch with max(0.0) is faster here? compiler explorer says its slightly more instructions

/// Centroids must be sorted in ascending order. Uses binary search for efficiency.
#[inline]
pub fn find_nearest_centroid(value: f32, centroids: &[f32]) -> u8 {
debug_assert!(!centroids.is_empty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug assert that centroids are sorted?


rotation.inverse_rotate(&dequantized_rotated, &mut dequantized);
if norm > 0.0 {
for val in &mut dequantized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iter_mut()

static SESSION: LazyLock<VortexSession> =
LazyLock::new(|| VortexSession::empty().with::<ArraySession>());

/// Create a FixedSizeListArray of random f32 vectors (i.i.d. standard normal).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to catch up on the papers - but is this what they test/benchmark on?

.into_iter()
.sorted_by_key(|s| s.code())
.collect_vec(),
turboquant_config: self.turboquant_config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's wait for @connortsui20's pluggable compressor

@connortsui20
Copy link
Contributor

Do we want to put this in vortex-tensor?

lwwmanning and others added 19 commits March 27, 2026 10:27
Implement the TurboQuant algorithm (arXiv:2504.19874) as a new lossy
encoding for high-dimensional vector data. This supports both the
MSE-optimal and inner-product-optimal (Prod) variants at 1-4 bits per
coordinate.

Key components:
- Max-Lloyd centroid computation on Beta(d/2,d/2) distribution
- Deterministic random rotation via nalgebra QR decomposition
- FastLanes BitPackedArray for index storage
- QJL residual correction for unbiased inner product estimation (Prod)

The encoding operates on FixedSizeList arrays of floats, which is the
storage format for Vector and FixedShapeTensor extension types.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
…ntegration

Add a CompressorPlugin wrapper that intercepts Vector and FixedShapeTensor
extension columns, applies TurboQuant encoding, and recursively compresses
the resulting children (norms, codes) via the inner compressor.

Expose this via WriteStrategyBuilder::with_vector_quantization(config),
which composes with existing encoding modes (default, compact, cuda).

TODO: restructure into BtrBlocks canonical_compressor directly (like
DateTimeParts) rather than the wrapper CompressorPlugin approach.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Move TurboQuant compression logic from a standalone CompressorPlugin
wrapper into the BtrBlocks canonical compressor, following the same
pattern as DateTimeParts. This gives TurboQuant access to the full
BtrBlocks recursive compression pipeline for its children (norms,
codes, etc.).

Changes:
- Add `turboquant_config: Option<TurboQuantConfig>` to BtrBlocksCompressor
- Add `with_turboquant(config)` to BtrBlocksCompressorBuilder
- Add tensor extension detection + compress_turboquant() in the
  Canonical::Extension arm of canonical_compressor
- Update WriteStrategyBuilder::with_vector_quantization to configure
  BtrBlocks directly instead of wrapping
- Remove TurboQuantCompressor wrapper and vortex-layout dep from
  vortex-turboquant

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Add TurboQuant benchmarks to the single_encoding_throughput suite,
covering compress and decompress for dim=128 and dim=768 at 2-bit
and 4-bit widths. Uses 1000 random N(0,1) vectors per benchmark.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
…nsform

Replace the O(d²) dense matrix rotation (previously nalgebra, then faer)
with a Structured Random Hadamard Transform (SRHT) that runs in O(d log d).
The SRHT applies D₃·H·D₂·H·D₁ where H is the Walsh-Hadamard transform
and Dₖ are random diagonal ±1 sign matrices.

This eliminates both the nalgebra and faer dependencies — the SRHT is
fully self-contained with no external linear algebra library needed.

Benchmark results (1000 vectors, mean throughput):

  | Benchmark                  | Before (nalgebra) | After (SRHT)  |
  |----------------------------|---------:|----------:|
  | compress dim128 2-bit      | 222 MB/s |  242 MB/s |
  | compress dim768 2-bit      |  32 MB/s |  181 MB/s |
  | decompress dim128 2-bit    |  87 MB/s |  614 MB/s |
  | decompress dim768 2-bit    |   6 MB/s |  458 MB/s |

For non-power-of-2 dimensions (e.g., 768), input is zero-padded to the
next power of 2 (1024) and all padded coordinates are quantized.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
…tests

Replace the loose "normalized MSE < 1.0" check with rigorous tests:

- mse_within_theoretical_bound: Verifies per-vector normalized MSE is
  within 10x the paper's Theorem 1 bound (sqrt(3)*pi/2 / 4^b). Tests
  across dim={128,256} x bits={1,2,3,4}.

- prod_inner_product_bias: Verifies the Prod variant produces
  approximately unbiased inner products by computing <query, x_hat> vs
  <query, x> over 500 random pairs and checking mean relative error < 0.3.

- mse_decreases_with_bits: Verifies MSE monotonically decreases with
  increasing bit-width for both Mse and Prod variants.

Total: 49 tests (up from 39).

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
- Hoist per-row allocations (residual, projected) out of encode_prod loop
- Use BufferMut<u8> directly for sign_buf instead of Vec + copy
- Remove unused num-traits dependency
- Remove dead unreachable!() branch (bit_width >= 2 validated at entry)
- Fix orphaned doc comment blank line
- Generate public-api.lock files for new/modified crates

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Address code review findings:

- Tighten SRHT roundtrip test tolerance from 1e-3 to 1e-5 (verified
  exact to ~4e-7 relative error across dim 32-1024). Consolidate into
  parameterized rstest covering power-of-2 and non-power-of-2 dims.
- Rename `pd` -> `padded_dim` throughout compress.rs and decompress.rs
  for clarity.
- Add early dimension validation (>= 2) in turboquant_encode with
  clear error message.
- Add edge case tests: single-row roundtrip (Mse + Prod), empty array
  Prod variant, dimension-below-2 rejection.
- Tighten norm preservation test to 1e-5 relative tolerance.

Total: 59 tests (up from 49).

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
…ror bounds

Add comprehensive crate documentation including:
- Theoretical MSE bounds per bit-width from the paper's Theorem 1
- Compression ratio table for common dimensions (256-1536), accounting
  for power-of-2 padding overhead on non-power-of-2 dims (768, 1536)
- Working doctest demonstrating encode usage and size verification

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Extend bit_width range from 1-4 to 1-8. At 8 bits (256 centroids),
codes are stored as raw u8 instead of bit-packed since BitPackedArray
doesn't support width >= 8. This gives ~4x compression from f32 with
near-lossless quality (MSE bound 4.15e-05).

Changes:
- Update all validation sites (compress, array, centroids) to accept
  1-8 bits (MSE) and 2-8 bits (Prod)
- Skip bitpack_encode for 8-bit codes, store PrimitiveArray<u8> directly
- Extend crate docs with full 1-8 bit bound/ratio tables
- Add 6-bit and 8-bit test cases for roundtrip, MSE bounds, Prod bias,
  and monotonic MSE decrease. High bit-width tests verify MSE < 4-bit
  MSE and MSE < 1% (since the theoretical bound becomes unrealistically
  tight at 5+ bits due to SRHT finite-dimension effects)
- Regenerate public-api.lock

Total: 69 unit tests + 1 doctest.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Allow Prod variant bit_width up to 9, where the MSE component uses 8-bit
codes (raw u8) plus 1-bit QJL correction. The 8-bit MSE codes can be fed
directly into int8 GEMM kernels on tensor cores without unpacking.

- Update Prod validation to 2-9, MSE remains 1-8
- Restructure top-level validation into per-variant match
- Add 9-bit roundtrip, inner product bias, and monotonicity tests
- Document tensor core use case in crate docs

Total: 71 unit tests + 1 doctest.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Expand TurboQuant throughput benchmarks to cover common embedding
dimensions:
- dim=128 (2-bit, 4-bit) — small embeddings
- dim=768 (2-bit) — BERT / sentence-transformers
- dim=1024 (2-bit, 4-bit) — larger embedding models
- dim=1536 (2-bit, 4-bit) — OpenAI ada-002, exercises non-power-of-2
  padding overhead

All benchmarks use i.i.d. N(0,1) vectors with fixed seed — a
conservative worst-case for TurboQuant since real neural embeddings
have structure that the SRHT exploits for better quantization.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Add methods to persist and restore SRHT rotation signs as BoolArray,
eliminating the need to regenerate from seed during decompression:

- `export_inverse_signs_bool_array()`: Exports 3 × padded_dim sign bits
  as a single BoolArray in inverse-application order [D₃|D₂|D₁] so
  decompression iterates sequentially.
- `from_bool_array(signs, dim)`: Reconstructs RotationMatrix from stored
  signs without needing the seed.
- `apply_inverse_srht_from_bits(buf, signs_bytes, padded_dim, norm_factor)`:
  Hot-path free function that applies inverse SRHT directly from raw sign
  bytes, avoiding intermediate Vec<f32> reconstruction.

Convention: bit=1 means +1, bit=0 means -1 (negate).

Tests verify:
- Export→import roundtrip produces identical rotation (3 dims)
- Hot-path function matches struct-based inverse_rotate exactly

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Add two new cascading array types that replace the monolithic
TurboQuantArray:

TurboQuantMSEArray (4 children):
  - codes (BitPackedArray or PrimitiveArray<u8>)
  - norms (PrimitiveArray<f32>)
  - centroids (PrimitiveArray<f32>, stored codebook)
  - rotation_signs (BoolArray, 3 * padded_dim bits, inverse order)

TurboQuantQJLArray (4 children):
  - mse_inner (TurboQuantMSEArray at bit_width - 1)
  - qjl_signs (BoolArray, num_rows * padded_dim)
  - residual_norms (PrimitiveArray<f32>)
  - rotation_signs (BoolArray, QJL rotation, inverse order)

Both store all precomputed data (centroids, rotation signs) as children
to eliminate recomputation during decompression. Validity is pushed down
to the codes child via ValidityVTableFromChild at each level.

Includes decompression implementations for both new types that use
stored centroids/signs and the hot-path apply_inverse_srht_from_bits.

The old TurboQuantArray and its decode paths are retained for now.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Add `turboquant_encode_mse()` and `turboquant_encode_qjl()` that produce
the new cascaded array types:

- turboquant_encode_mse: produces TurboQuantMSEArray with stored
  centroids (PrimitiveArray<f32>) and rotation signs (BoolArray)
- turboquant_encode_qjl: produces TurboQuantQJLArray wrapping an
  inner TurboQuantMSEArray at bit_width-1, with QJL signs (BoolArray)
  and QJL rotation signs (BoolArray)

Tests verify:
- Roundtrip encode/decode for both new types at various dims/bit_widths
- New MSE path matches legacy path exactly (bit-for-bit)
- Edge cases: empty arrays and single-row arrays for both types

Total: 90 unit tests + 1 doctest.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Update the BtrBlocks TurboQuant compressor to produce the new cascaded
TurboQuantQJLArray(TurboQuantMSEArray) structure. The compressor no
longer manually compresses each child — it produces the TurboQuant
array and lets the layout writer's recursive descent handle child
compression naturally.

This removes the explicit per-child compress_canonical calls and the
BtrBlocksCompressor self-reference, making the compressor stateless.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Adds public API entries for TurboQuantMSE, TurboQuantMSEArray,
TurboQuantQJL, TurboQuantQJLArray, and the new encode functions.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
…ead code

Restructure the turboquant crate to follow the fastlanes encoding
pattern where each encoding type gets its own subdirectory with
array/ and vtable/ subdirectories:

  mse/
    mod.rs        — marker struct + re-exports
    array/mod.rs  — TurboQuantMSEArray struct + accessors
    vtable/mod.rs — VTable + ValidityChild impls
  qjl/
    mod.rs        — marker struct + re-exports
    array/mod.rs  — TurboQuantQJLArray struct + accessors
    vtable/mod.rs — VTable + ValidityChild impls

Delete all dead code:
- Remove old monolithic array.rs (TurboQuantArray, TurboQuantVariant)
- Remove old mse_array.rs, qjl_array.rs flat files
- Remove old rules.rs
- Remove legacy decode functions from decompress.rs
- Remove TurboQuantVariant from TurboQuantConfig (now just bit_width + seed)

Update all consumers:
- BtrBlocks compressor (already using new API)
- Benchmarks: turboquant_encode → turboquant_encode_mse
- lib.rs: use glob re-exports (pub use mse::*, pub use qjl::*)
- Docstring example updated for new API

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Add 8 new tests addressing gaps identified in review:

Validation:
- qjl_rejects_dimension_below_2: QJL path also rejects dim < 2

Stored metadata verification:
- stored_centroids_match_computed: stored codebook == get_centroids()
- stored_rotation_signs_produce_correct_decode: stored signs match
  seed-derived signs bit-for-bit

QJL quality:
- qjl_mse_within_theoretical_bound: QJL MSE satisfies (b-1)-bit bound
  (3 parametrized cases: dim 128/256, bits 3-4)
- high_bitwidth_qjl_is_small: 8-9 bit QJL < 4-bit QJL and < 1% MSE

Also add explanatory comments for:
- QJL scale factor derivation (sqrt(π/2)/padded_dim) in decompress.rs
- Why QJL uses seed+1 for statistical independence in compress.rs

Total: 85 unit tests + 1 doctest.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
lwwmanning and others added 4 commits March 27, 2026 10:27
…ed signs

The bit-packed apply_inverse_srht_from_bits path introduced a ~20%
decode throughput regression vs the original f32 sign multiply path,
because per-element bit extraction + conditional negate is hard for
the compiler to autovectorize.

Fix: expand the stored BoolArray signs into f32 ±1.0 vectors once at
decode start via RotationMatrix::from_bool_array(), then use the
original inverse_rotate() with its SIMD-friendly apply_signs() inner
loop. The expansion costs 3 × padded_dim × 4 bytes of temporary
memory (12KB for dim=1024), amortized over all rows.

We still store signs as 1-bit BoolArray on disk (32x space savings),
but recover full autovectorized throughput at decode time.

The apply_inverse_srht_from_bits function is retained (with tests) for
potential future use with explicit SIMD bit-extraction intrinsics.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
- Reject nullable FixedSizeListArray input in both turboquant_encode_mse
  and turboquant_encode_qjl with a clear error message. TurboQuant is
  lossy and cannot preserve null positions.
- Fix with_vector_quantization composability: store TurboQuantConfig in
  the builder and apply at build() time, so it doesn't discard a
  previously-configured compressor. Document precedence rules.
- Export VECTOR_EXT_ID and FIXED_SHAPE_TENSOR_EXT_ID as public constants
  from vortex-turboquant; import in vortex-btrblocks instead of
  hardcoding duplicate string literals.
- Add QJL roundtrip and inner product bias tests for dim=768 (non-
  power-of-2 requiring padding to 1024).
- Move function-scoped imports to top of test module and benchmark file
  per CLAUDE.md conventions.
- Regenerate public-api.lock.

Total: 88 unit tests + 1 doctest.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
- Add TurboQuantMSE and TurboQuantQJL to ALLOWED_ENCODINGS in
  vortex-file so TurboQuant-encoded files can be deserialized
- Fix as_ptype() panic: use primitive.ptype() after to_canonical()
  instead of calling the panicking as_ptype() on the raw dtype
- Move rand_distr to dev-dependencies (only used in tests)
- Remove unused vortex-mask dependency
- Handle nullable storage in compress_turboquant: return None to fall
  through to default compression instead of failing
- Remove apply_inverse_srht_from_bits (dead code, only used in its own
  test) and apply_signs_from_bits helper
- Fix function-scoped import in gen_random_signs
- Add TODO for double f32 extraction in QJL encode
- Fix execute() signature after merge with develop (Arc<Array<Self>>)
- Collapse nested if-let per clippy

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
- Consolidate encode_decode_mse and encode_decode_qjl test helpers into
  a single closure-parameterized encode_decode function
- Replace 14 copy-pasted benchmark functions (~200 lines) with a
  turboquant_bench! macro (~40 lines)
- Extract QJL correction scale factor to a named function with doc
  comment explaining the derivation
- Precompute centroid decision boundaries (midpoints) once before the
  row loop, replacing per-coordinate distance comparisons with a single
  partition_point lookup. This removes two abs() calls and a branch
  from the innermost quantization loop.

Net: -150 lines.

Signed-off-by: Will Manning <will@spiraldb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lwwmanning lwwmanning force-pushed the claude/admiring-lichterman branch from bb55c24 to c51db31 Compare March 27, 2026 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants