Skip to content

esi: replace _findOpeningTag with memchr#13173

Open
pixitha wants to merge 3 commits into
apache:masterfrom
pixitha:esi-find-opening-tag-fast-path
Open

esi: replace _findOpeningTag with memchr#13173
pixitha wants to merge 3 commits into
apache:masterfrom
pixitha:esi-find-opening-tag-fast-path

Conversation

@pixitha
Copy link
Copy Markdown

@pixitha pixitha commented May 18, 2026

Problem

EsiParser::_findOpeningTag uses a hand-rolled two-state-machine byte loop
to scan response bodies for <esi: and <!--esi opening tags. On a CDN
edge server profiled under production load, this function appeared as a
top-5 CPU leaf, ahead of OpenSSL EC math and zlib.

The root cause is that the loop is unconditionally O(n) at ~2 ns/byte — it
checks every byte even when the response body contains no ESI tags at all,
which is the common case on a CDN edge.

There is also a correctness issue in the original: the KMP-failure comment
in the source notes that the state machine mishandles sequences like
<e<esi: — the first <e advances a partial-match counter but then
fails, and the recovery doesn't rewind to the second <. The new
implementation avoids this entirely.

Solution

Replace the loop with memchr + memcmp:

  1. memchr locates the next < — delegates to the platform's optimized
    libc implementation (e.g. __memchr_avx2 on glibc x86-64), which
    processes 128 bytes per iteration in an AVX2 unrolled loop.
  2. memcmp verifies the candidate anchor (<esi: or <!--esi).
  3. On no-match, advance past the < and repeat.

Partial-match (chunk boundary) paths are preserved: when the buffer ends
mid-prefix, the function returns PARTIAL_MATCH so the caller can
accumulate more data before deciding.

The only interface change is #include <cstring> — all call sites, return
types, and semantics are identical.

Performance

Note: synthetic microbenchmark. Benchmark runs each scenario over a
256 KiB body for 1.0 s. Real response bodies are shorter and vary in
tag density; treat these as directional numbers.

Measured on E5-2683 v4 Broadwell, 2.10 GHz, RHEL 8, glibc 2.28, gcc -O2:

Scenario Baseline GB/s memchr GB/s Speedup
text-only (no <) 0.503 36.871 73×
typical HTML (~1.5% <, no ESI) 0.475 19.241 40×
html-sparse 0.470 18.896 40×
html-dense 0.437 14.853 34×
<!--esi comment nodes 0.431 18.362 43×
pathological (5% bare <) 0.290 1.512

The pathological case is the worst case for this approach — every false <
triggers a memcmp. Real CDN response bodies fall in the typical HTML range.

Testing

  • Unit tests: 870 assertions pass in parser_test. Four new SECTION
    blocks cover boundary conditions specific to this implementation:
    exact-length prefix at chunk end for both <esi: and <!--esi,
    <!--esi without required trailing whitespace, and multiple false <
    anchors before a valid tag.
  • Autests: esi, esi_304, and esi_nested_include all pass (built
    inside ci.trafficserver.apache.org/ats/fedora:42, CMake preset
    ci-fedora-autest).
  • Differential correctness check: 10 million random inputs were run
    through both the old and new implementations — 0 mismatches across all
    MATCH_TYPE returns, positions, and is_html_comment_node flags.

Replace the hand-rolled two-state-machine loop with memchr + memcmp.
memchr delegates scanning to the platform's optimized implementation
(e.g. __memchr_avx2 on glibc x86-64) to skip non-'<' bytes, then
memcmp verifies each candidate anchor.

This eliminates the KMP-failure limitation noted in the original
comment: the old implementation could mishandle opening sequences
like '<e<esi'; the new one finds the correct anchor unconditionally.

Adds four parser_test sections covering boundary cases specific to
the new implementation: exact-length prefix at chunk end for both
<esi: and <!--esi, <!--esi without required trailing whitespace, and
multiple false '<' anchors before a valid tag.
@shukitchan shukitchan requested a review from Copilot May 18, 2026 22:11
@shukitchan shukitchan self-assigned this May 18, 2026
@shukitchan shukitchan added the esi esi plugin label May 18, 2026
@shukitchan shukitchan added this to the 11.0.0 milestone May 18, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces the hand-rolled byte-by-byte state machine in EsiParser::_findOpeningTag with a memchr+memcmp based scan that delegates the common case (skipping non-< bytes) to libc's optimized implementation. This also fixes a known correctness limitation of the previous state machine (mishandling of repeated opening sequences like <e<esi).

Changes:

  • Rewrite _findOpeningTag to use memchr to locate < and memcmp to verify the <esi: / <!--esi anchors, preserving the COMPLETE_MATCH / PARTIAL_MATCH / NO_MATCH semantics and partial-match-at-chunk-boundary behavior.
  • Add <cstring> include for memchr/memcmp.
  • Add four new parser test SECTIONs covering exact-length prefixes at chunk boundaries, <!--esi without trailing whitespace, and multiple false < anchors before a valid tag.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
plugins/esi/lib/EsiParser.cc Replaces the two-state-machine scan loop with a memchr/memcmp-based implementation and updates the function's leading comment.
plugins/esi/test/parser_test.cc Adds boundary-condition test sections for the new implementation.

Comment thread plugins/esi/lib/EsiParser.cc Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@bryancall bryancall requested a review from shukitchan May 18, 2026 22:37
@bryancall bryancall assigned pixitha and unassigned shukitchan May 18, 2026
@bryancall bryancall requested a review from zwoop May 18, 2026 22:38
@apache apache deleted a comment from shukitchan May 18, 2026
zwoop
zwoop previously approved these changes May 19, 2026
Comment thread plugins/esi/lib/EsiParser.cc Outdated
if (++i_esi == ESI_TAG_PREFIX_LEN) {
is_html_comment_node = false;
opening_tag_pos = i_data - i_esi + 1;
const char *const buf = data.data();
Copy link
Copy Markdown
Contributor

@zwoop zwoop May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought here, and I asked Claude to make an example: Did you consider pulling in and using libswoc TextView instead? It excels at things like "string parsing". Not a show stopper, and could be a future consideration.

I asked Claude to write a suggested use with TextView (not tested ...)

swoc::TextView view{data.data() + start_pos, data.size() - start_pos};
const swoc::TextView esi_prefix{ESI_TAG_PREFIX, ESI_TAG_PREFIX_LEN};
const swoc::TextView html_prefix{HTML_COMMENT_NODE_INFO.tag_suffix,
                                 size_t(HTML_COMMENT_NODE_INFO.tag_suffix_len)};
while (!view.empty()) {
  size_t pos = view.find('<');
  if (pos == swoc::TextView::npos) return NO_MATCH;
  view.remove_prefix(pos);
  size_t abs_pos = data.size() - view.size();
  if (view.starts_with(esi_prefix)) { /* COMPLETE_MATCH esi */ }
  if (view.size() > html_prefix.size() && view.starts_with(html_prefix)) {
    char ch = view[html_prefix.size()];
    if (ch==' '||ch=='\t'||ch=='\r'||ch=='\n') { /* COMPLETE_MATCH html */ }
  }
  if (view.size() < esi_prefix.size() && esi_prefix.starts_with(view)) { /* PARTIAL */ }
  if (view.size() <= html_prefix.size() && html_prefix.starts_with(view)) { /* PARTIAL */ }
  view.remove_prefix(1);
}

Comment thread plugins/esi/lib/EsiParser.cc Outdated
const char *const buf = data.data();
const size_t total = data.size();
const size_t esi_len = ESI_TAG_PREFIX_LEN;
const size_t hlen = HTML_COMMENT_NODE_INFO.tag_suffix_len;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need this hlen local? The compiler will optimize it away, but I honestly would just like to see HTML_COMMENT_NODE_INFO.tag_suffix_len where you need this, specially since you memcmp from HTML_COMMENT_NODE_INFO.tag_suffix already (no shadow variable for the string).

Same with the esi_len I think, neither are blockers.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected in latest commit

Remove the hlen and esi_len local aliases so the length argument to each
memcmp call is read from the same named constant as the string argument,
consistent with how ESI_TAG_PREFIX and HTML_COMMENT_NODE_INFO.tag_suffix
were already written in full.

Addresses review feedback from zwoop on PR apache#13173.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bneradt
Copy link
Copy Markdown
Contributor

bneradt commented May 19, 2026

[approve ci]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

esi esi plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants