tscore: optional simdutf path for ats_base64 encode/decode#13166
Draft
phongn wants to merge 2 commits into
Draft
tscore: optional simdutf path for ats_base64 encode/decode#13166phongn wants to merge 2 commits into
phongn wants to merge 2 commits into
Conversation
The hand-rolled base64 implementation in ink_base64.cc is a measurable hotspot in places that encode or decode larger payloads (OCSP DER requests, S3 auth HMACs, signed URL segments). simdutf provides SIMD-accelerated kernels that run roughly an order of magnitude faster on medium-and-larger inputs on AVX2/AVX-512 hardware. Wire simdutf in as an opt-in dependency through the existing auto_option machinery (ENABLE_SIMDUTF, default AUTO). When the package is available, the wrapper dispatches to simdutf for inputs above an empirically chosen threshold and keeps the scalar path for smaller inputs, where simdutf's per-call overhead would otherwise be a regression (notably the 8-byte SnowflakeID encode). Both paths preserve the existing public contract: standard '+/=' encode alphabet, accepts both '+/' and '-_' on decode in the same call, tolerates missing padding, truncates silently on invalid input, and always writes a trailing NUL. A new microbenchmark under tools/benchmark locks the InkAPITest SDK_API_ENCODING fixture as a regression test and provides the throughput numbers used to choose the thresholds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds an optional simdutf-backed fast path for ATS base64 encode/decode while keeping the existing scalar implementation as a fallback for small inputs or builds without simdutf.
Changes:
- Adds
ENABLE_SIMDUTF/TS_USE_SIMDUTFCMake wiring and links simdutf intotscorewhen enabled. - Refactors
ink_base64.ccinto scalar helpers plus hybrid simdutf dispatch. - Adds a benchmark target with base64 correctness checks and throughput benchmarks.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
CMakeLists.txt |
Adds simdutf auto-option detection. |
include/tscore/ink_config.h.cmake.in |
Exposes TS_USE_SIMDUTF in generated config. |
src/tscore/CMakeLists.txt |
Links simdutf into tscore when enabled. |
src/tscore/ink_base64.cc |
Implements hybrid scalar/simdutf base64 encode/decode. |
tools/benchmark/CMakeLists.txt |
Adds the new base64 benchmark executable. |
tools/benchmark/benchmark_ink_base64.cc |
Adds correctness checks and benchmarks for base64 and tolower paths. |
- CMakeLists.txt: require simdutf >= 7.0.0. ats_base64_decode uses base64_default_or_url and the decode_up_to_bad_char parameter, both of which landed in simdutf 7.0.0. Without this pin, an older simdutf passes find_package and then fails at compile time. ENABLE_SIMDUTF in AUTO mode silently falls back to the scalar path when the found simdutf is too old; ENABLE_SIMDUTF=ON hard-errors so the user knows their explicit request cannot be satisfied (Copilot). - ink_base64.cc: align the simdutf and scalar decode paths on whitespace. simdutf's forgiving mode silently skips ASCII whitespace and continues; the scalar treats whitespace as end-of-input. With the two paths gated by an input-size threshold, this made TSBase64Decode results depend on build configuration. Pre-scan the input with the same printableToSixBit table upfront and truncate inBufferSize at the first non-alphabet byte before either path runs, so both see the same prefix of alphabet bytes (Copilot). - ink_base64.cc: restructure the scalar decode tail. The previous code ran one extra loop iteration past the alphabet prefix when there were 1..3 trailing alphabet bytes (reading inBuffer[2..3] which was either OOB to the caller or past the prefix) and then read inBuffer[-2] in the trailing adjustment block when no iterations had advanced inBuffer. Process only complete 4-character groups in the main loop and decode any 2- or 3-byte tail explicitly; a 1-byte tail encodes nothing meaningful and is dropped. This was flagged as a known follow-up when the PR landed. - src/tscore/unit_tests/test_ink_base64.cc: new unit test under test_tscore so the scalar and simdutf paths are covered by ctest in every build. Bracketing sizes 0/1/8/23/24/25/47/48/49/4096 exercise both implementations and the threshold transitions. Adds focused cases for URL-safe alphabet decode, in-place decode (dst == src), invalid-byte truncation, whitespace truncation (validates the new alignment), the InkAPITest fixture, and the 1-/2-/3-char tail cases that the scalar restructure now handles cleanly (Copilot). - tools/benchmark/benchmark_ink_base64.cc: rewrite the file header to describe what the bench actually does (scalar-vs-simdutf throughput comparison) and drop the correctness TEST_CASEs that moved to the unit test. Add Catch::Benchmark::keep_memory barriers so the inlined buffer writes aren't DCEd past the first observed byte, and a config-print case that prints whether simdutf is wired in (Copilot). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wire simdutf in as an opt-in SIMD backend for
ats_base64_encodeandats_base64_decode(also exposed via theTSBase64Encode/TSBase64Decodeplugin API). Roughly an order-of-magnitude speedup on medium and larger inputs on AVX2 hardware; behavior-preserving for every in-tree caller.How it's wired
auto_option(SIMDUTF FEATURE_VAR TS_USE_SIMDUTF PACKAGE_DEPENDS simdutf)— defaultAUTO, same shape asHWLOC/UNWIND. Builds without simdutf installed are unaffected and fall back to the scalar path.src/tscore/ink_base64.ccbecomes a thin hybrid wrapper: scalar helpers in an anonymous namespace (always compiled), simdutf used only wheninBufferSizeexceeds an empirically chosen per-direction threshold. Tiny-input cases (e.g. the 8-byteSnowflakeIDencode) stay on the scalar path to avoid simdutf's per-call dispatch overhead.include/tscore/ink_config.h.cmake.ingains#cmakedefine01 TS_USE_SIMDUTF.Performance (Xeon E5-2683 v4, AVX2)
Behavior
Both paths preserve the existing public contract:
+/=alphabet, no line breaks, trailing NUL written atoutBuffer[length].+/and-_in the same input, tolerates missing padding, truncates silently on invalid characters, trailing NUL written.plugins/experimental/magick) is preserved.One behavioral delta when the simdutf path is taken: simdutf silently skips ASCII whitespace (space, tab, CR, LF, FF) inside the input, whereas the scalar path stops at the first whitespace byte. None of the in-tree callers feed whitespace to these functions; flagged in the file's header comment.
Test plan
tools/benchmark/benchmark_ink_base64covers both correctness and performance. Locks the byte-exact fixture fromInkAPITest.cc::SDK_API_ENCODINGas a regression test.ENABLE_SIMDUTF=AUTO(hybrid) andENABLE_SIMDUTF=OFF(scalar-only).cmake --build build -t formatclean.traffic_serveragainst a workload exercising OCSP stapling and the S3origin_server_authplugin (encode hot paths).Notes for reviewers
BASE64_ENCODE_SIMD_THRESHOLD=24,BASE64_DECODE_SIMD_THRESHOLD=48) were chosen from the benchmark data and documented in the file. The crossover shifts on different cores but the thresholds are robust within an order of magnitude.inBufferSizeis 1 or 2 (the existinginBuffer[-2]access in the trailing-bytes adjustment). I preserved this rather than smuggle in a behavior change. Worth a follow-up issue but out of scope here.🤖 Generated with Claude Code