Skip to content

tscore: optional simdutf path for ats_base64 encode/decode#13166

Draft
phongn wants to merge 2 commits into
apache:masterfrom
phongn:simdutf-base64
Draft

tscore: optional simdutf path for ats_base64 encode/decode#13166
phongn wants to merge 2 commits into
apache:masterfrom
phongn:simdutf-base64

Conversation

@phongn
Copy link
Copy Markdown
Collaborator

@phongn phongn commented May 14, 2026

Summary

Wire simdutf in as an opt-in SIMD backend for ats_base64_encode and ats_base64_decode (also exposed via the TSBase64Encode / TSBase64Decode plugin API). Roughly an order-of-magnitude speedup on medium and larger inputs on AVX2 hardware; behavior-preserving for every in-tree caller.

How it's wired

  • auto_option(SIMDUTF FEATURE_VAR TS_USE_SIMDUTF PACKAGE_DEPENDS simdutf) — default AUTO, same shape as HWLOC / UNWIND. Builds without simdutf installed are unaffected and fall back to the scalar path.
  • src/tscore/ink_base64.cc becomes a thin hybrid wrapper: scalar helpers in an anonymous namespace (always compiled), simdutf used only when inBufferSize exceeds an empirically chosen per-direction threshold. Tiny-input cases (e.g. the 8-byte SnowflakeID encode) stay on the scalar path to avoid simdutf's per-call dispatch overhead.
  • include/tscore/ink_config.h.cmake.in gains #cmakedefine01 TS_USE_SIMDUTF.

Performance (Xeon E5-2683 v4, AVX2)

Op Size Scalar only simdutf only Hybrid (this PR)
encode 8 B 15.7 ns 25.5 ns 16.8 ns
encode 32 B 45.8 29.5 30.7
encode 200 B 256 47.9 50.2
encode 4096 B 5128 525 534
decode 12 B b64 21.8 66.5 22.5
decode 44 B b64 70.8 84.3 68.4
decode 268 B b64 385 94.1 113
decode 5464 B b64 7295 583 572

Behavior

Both paths preserve the existing public contract:

  • Encode: standard +/= alphabet, no line breaks, trailing NUL written at outBuffer[length].
  • Decode: accepts both +/ and -_ in the same input, tolerates missing padding, truncates silently on invalid characters, trailing NUL written.
  • In-place decode (used by plugins/experimental/magick) is preserved.

One behavioral delta when the simdutf path is taken: simdutf silently skips ASCII whitespace (space, tab, CR, LF, FF) inside the input, whereas the scalar path stops at the first whitespace byte. None of the in-tree callers feed whitespace to these functions; flagged in the file's header comment.

Test plan

  • Catch2 microbench tools/benchmark/benchmark_ink_base64 covers both correctness and performance. Locks the byte-exact fixture from InkAPITest.cc::SDK_API_ENCODING as a regression test.
  • 46 correctness assertions pass with ENABLE_SIMDUTF=AUTO (hybrid) and ENABLE_SIMDUTF=OFF (scalar-only).
  • cmake --build build -t format clean.
  • Jenkins CI green.
  • Manual smoke of traffic_server against a workload exercising OCSP stapling and the S3 origin_server_auth plugin (encode hot paths).

Notes for reviewers

  • Thresholds (BASE64_ENCODE_SIMD_THRESHOLD=24, BASE64_DECODE_SIMD_THRESHOLD=48) were chosen from the benchmark data and documented in the file. The crossover shifts on different cores but the thresholds are robust within an order of magnitude.
  • The scalar decoder contains a latent out-of-bounds read when inBufferSize is 1 or 2 (the existing inBuffer[-2] access in the trailing-bytes adjustment). I preserved this rather than smuggle in a behavior change. Worth a follow-up issue but out of scope here.

🤖 Generated with Claude Code

The hand-rolled base64 implementation in ink_base64.cc is a measurable
hotspot in places that encode or decode larger payloads (OCSP DER
requests, S3 auth HMACs, signed URL segments). simdutf provides
SIMD-accelerated kernels that run roughly an order of magnitude faster
on medium-and-larger inputs on AVX2/AVX-512 hardware.

Wire simdutf in as an opt-in dependency through the existing
auto_option machinery (ENABLE_SIMDUTF, default AUTO). When the package
is available, the wrapper dispatches to simdutf for inputs above an
empirically chosen threshold and keeps the scalar path for smaller
inputs, where simdutf's per-call overhead would otherwise be a
regression (notably the 8-byte SnowflakeID encode).

Both paths preserve the existing public contract: standard '+/=' encode
alphabet, accepts both '+/' and '-_' on decode in the same call,
tolerates missing padding, truncates silently on invalid input, and
always writes a trailing NUL. A new microbenchmark under tools/benchmark
locks the InkAPITest SDK_API_ENCODING fixture as a regression test and
provides the throughput numbers used to choose the thresholds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an optional simdutf-backed fast path for ATS base64 encode/decode while keeping the existing scalar implementation as a fallback for small inputs or builds without simdutf.

Changes:

  • Adds ENABLE_SIMDUTF/TS_USE_SIMDUTF CMake wiring and links simdutf into tscore when enabled.
  • Refactors ink_base64.cc into scalar helpers plus hybrid simdutf dispatch.
  • Adds a benchmark target with base64 correctness checks and throughput benchmarks.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
CMakeLists.txt Adds simdutf auto-option detection.
include/tscore/ink_config.h.cmake.in Exposes TS_USE_SIMDUTF in generated config.
src/tscore/CMakeLists.txt Links simdutf into tscore when enabled.
src/tscore/ink_base64.cc Implements hybrid scalar/simdutf base64 encode/decode.
tools/benchmark/CMakeLists.txt Adds the new base64 benchmark executable.
tools/benchmark/benchmark_ink_base64.cc Adds correctness checks and benchmarks for base64 and tolower paths.

Comment thread CMakeLists.txt
Comment thread src/tscore/ink_base64.cc Outdated
Comment thread tools/benchmark/CMakeLists.txt
Comment thread tools/benchmark/benchmark_ink_base64.cc Outdated
- CMakeLists.txt: require simdutf >= 7.0.0. ats_base64_decode uses
  base64_default_or_url and the decode_up_to_bad_char parameter, both
  of which landed in simdutf 7.0.0. Without this pin, an older simdutf
  passes find_package and then fails at compile time. ENABLE_SIMDUTF
  in AUTO mode silently falls back to the scalar path when the found
  simdutf is too old; ENABLE_SIMDUTF=ON hard-errors so the user knows
  their explicit request cannot be satisfied (Copilot).

- ink_base64.cc: align the simdutf and scalar decode paths on
  whitespace. simdutf's forgiving mode silently skips ASCII whitespace
  and continues; the scalar treats whitespace as end-of-input. With
  the two paths gated by an input-size threshold, this made
  TSBase64Decode results depend on build configuration. Pre-scan the
  input with the same printableToSixBit table upfront and truncate
  inBufferSize at the first non-alphabet byte before either path runs,
  so both see the same prefix of alphabet bytes (Copilot).

- ink_base64.cc: restructure the scalar decode tail. The previous code
  ran one extra loop iteration past the alphabet prefix when there
  were 1..3 trailing alphabet bytes (reading inBuffer[2..3] which was
  either OOB to the caller or past the prefix) and then read
  inBuffer[-2] in the trailing adjustment block when no iterations had
  advanced inBuffer. Process only complete 4-character groups in the
  main loop and decode any 2- or 3-byte tail explicitly; a 1-byte tail
  encodes nothing meaningful and is dropped. This was flagged as a
  known follow-up when the PR landed.

- src/tscore/unit_tests/test_ink_base64.cc: new unit test under
  test_tscore so the scalar and simdutf paths are covered by ctest in
  every build. Bracketing sizes 0/1/8/23/24/25/47/48/49/4096 exercise
  both implementations and the threshold transitions. Adds focused
  cases for URL-safe alphabet decode, in-place decode (dst == src),
  invalid-byte truncation, whitespace truncation (validates the new
  alignment), the InkAPITest fixture, and the 1-/2-/3-char tail cases
  that the scalar restructure now handles cleanly (Copilot).

- tools/benchmark/benchmark_ink_base64.cc: rewrite the file header to
  describe what the bench actually does (scalar-vs-simdutf throughput
  comparison) and drop the correctness TEST_CASEs that moved to the
  unit test. Add Catch::Benchmark::keep_memory barriers so the
  inlined buffer writes aren't DCEd past the first observed byte, and
  a config-print case that prints whether simdutf is wired in
  (Copilot).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants