Skip to content

More revisions to RFC 33#42

Merged
lwwmanning merged 7 commits intodevelopfrom
claude/stupefied-chaum
Apr 6, 2026
Merged

More revisions to RFC 33#42
lwwmanning merged 7 commits intodevelopfrom
claude/stupefied-chaum

Conversation

@lwwmanning
Copy link
Copy Markdown
Contributor

follow up to #33 #34 #35 #36 #37 #38

lwwmanning and others added 7 commits April 6, 2026 13:22
Stage 1a (merged in vortex-data/vortex#7269):
- MSE-only TurboQuant with 8-bit default (near-lossless, ~4e-5 MSE)
- Dimension >= 128 scheme selection, 3-round SORF
- Original QJL PR (#7167) closed

Stage 1b (next — array representation cleanup):
- Power-of-2 dimension requirement (remove internal padding)
- FixedSizeListArray rotation signs for variable SRHT rounds
- Dtype-matching norms, structured metadata (format TBD pending
  vtable refactor)
- Goal: wire format ready for backward-compat guarantees

Stage 2 reframed as general-purpose structural encoding:
- Block decomposition is a vertical split of FSL by dimension,
  analogous to ChunkedArray's horizontal split by rows
- Encoding-agnostic: each block is independently encoded (all TQ
  initially, but supports heterogeneous child encodings)
- Straggler blocks noted as future work for no-qualifying-B dims
- PDX (Stage 3) similarly structural, not TQ-specific

Other changes:
- Codes/centroids remain separate slots; DictArray for canonicalize
- Updated compression ratio examples for 8-bit default
- Updated array layouts, migration table, references throughout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Fixes from critical review against cited sources:

- Fix SORF/SRHT terminology conflation: SORF (multi-round HD product
  from [5]) was incorrectly called "SRHT" (Tropp's single-round R·H·D
  from [3]) in ~15 places. Now consistent throughout.
- PDX speedup claims: cite precise Table 4 figures (2x avg, 1.5x at
  D>32) instead of ambiguous "about 40%". Clarify int8 layout and
  ADSampling are from the open-source impl, not the paper.
- Strengthen SORF disclaimer: [5] does not prove distributional
  closeness to Haar measure; butterfly-stage counting has no
  theoretical backing in [5].
- Fix d=2 "singularity" language: the arcsine distribution exists at
  d=2; the real issue is it's U-shaped and unsuitable for Max-Lloyd.
- Note GPU distance table at b=8 is 256KB (exceeds shared memory).
- Note Eviox [7] URL may require account access.
- Clarify Stage 1b gap: scheme still pads non-power-of-2 externally
  between Stage 1b and Stage 2.
- Clarify Stage 2 tension: block decomposition is TQ-internal in
  initial implementation; extraction to general-purpose type is future.
- Fix stale "k×3×B" in QJL strategy table (now k×R×B).

Structural reorganization:

- Move reference implementation bugs + Theorem 1 constant to Appendix A
- Move community QJL findings to Appendix B
- Move "Why not DCT?" + shared rotation speculation to Appendix C
- Replace with brief summaries + appendix references in main text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Reframe Stage 1 as a forward-looking description of the target end
state rather than a point-in-time snapshot of PR 7269. This helps
the RFC age well — readers approaching it in months will care about
what Stage 1 delivers, not which pieces landed in which PR.

- Merge Stage 1a + 1b into single "Stage 1: MSE-only TurboQuant
  (in progress)" section focused on target properties
- PR 7269 mentioned as "initial implementation is merged" context
- "Remaining work" list captures what's left to complete Stage 1
- Single array layout diagram for Stage 1 (target state)
- Merged Phase 1a/1b into single Phase 1 in Phasing section
- Simplified migration section and shipping table
- Removed all 1a/1b references throughout

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
…dimension)

Stage 2 needs dimension=768 (non-power-of-2) inside a single TQ array,
which contradicts the previous "dimension is always power-of-2" invariant.
The constraint actually applies to block_size: in Stage 1 block_size =
dimension (both power-of-2), but in Stage 2 dimension = num_blocks ×
block_size can be non-power-of-2. Fixed throughout: decoder invariant,
Stage 1 target properties, minimum dimension, current limitations.

Also fix "Stage Stage" typo on line 254.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Revert the "move padding to scheme level" decision — the TQ array
keeps its existing internal zero-padding for non-power-of-2 dimensions.
The power-of-2 constraint applies to block_size (the SORF dimension),
not the input dimension.

- Stage 1: accepts any d >= 4, pads non-power-of-2 internally
  (block_size = padded_dim). codes.list_size may exceed dimension.
- Stage 2: block decomposition eliminates padding for dims with a
  qualifying B (each block is natively power-of-2). No-qualifying-B
  dims fall back to internal zero-padding (single padded block).
- Decoder invariant: block_size is always power-of-2;
  codes.list_size = num_blocks × block_size (may differ from dimension
  when internal padding applies in Stage 1).
- Remove "require power-of-2 dimensions" from Stage 1 remaining work.
- Replace all "scheme-level padding" references with "internal
  zero-padding".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
…r intuition

- Fix Stage 2 comparison table: Stage 1 column now correctly uses
  padded_dim (not dim) for rotation signs, centroids, codes, and dot
  product — consistent with the Stage 1 array layout diagram.
- Remove stale "power-of-2 dimension requirement" from Phase 1 in
  Phasing section (was removed from Stage 1 remaining work earlier).
- Rewrite minimum dimension discussion: TQ is unlikely to be effective
  below d=64; exact threshold to be determined empirically. Modest
  padding (96→128) probably fine; large-fraction padding (32→64) not.
- Expand straggler blocks: for small stragglers (e.g., d=800 → 3×256
  + 32 remainder), SORF is ineffective; prefer uncompressed straggler
  or whole-vector padding. Note that full padding may beat block decomp
  with straggler for some dimensions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Will Manning <will@willmanning.io>
Signed-off-by: Will Manning <will@willmanning.io>
@lwwmanning lwwmanning force-pushed the claude/stupefied-chaum branch from 2f0d910 to 516f9c9 Compare April 6, 2026 17:22
@lwwmanning lwwmanning marked this pull request as ready for review April 6, 2026 17:24
@lwwmanning lwwmanning merged commit 3655eac into develop Apr 6, 2026
3 checks passed
@lwwmanning lwwmanning deleted the claude/stupefied-chaum branch April 6, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant