Skip to content

Conversation

@Naveed8951
Copy link

Commit Message:
regex: bound RE2 compilation memory during matcher initialization

Additional Description:
Bound RE2 compilation memory up-front using explicit re2::RE2::Options derived
from existing Envoy max program size limits, preventing excessive resource usage
during regex compilation.

Risk Level:
Medium

Testing:

  • Build and compile targets including source/common/common/regex.cc
  • Manual verification by loading configs with large/complex regex patterns and
    observing bounded compilation behavior

Docs Changes:
None

Release Notes:
None

Platform Specific Features:
None

@repokitteh-read-only
Copy link

Hi @Naveed8951, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #42924 was opened by Naveed8951.

see: more, trace.

@tyxia
Copy link
Member

tyxia commented Jan 15, 2026

/retest

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds RE2 compilation memory bounds to prevent excessive resource usage during regex compilation. The implementation derives memory limits from existing Envoy max program size configurations, establishing explicit RE2::Options with bounded max_mem values before regex compilation.

Changes:

  • Introduces helper functions to retrieve runtime max program size settings with safe defaults
  • Adds clampCompilationMaxMemFromProgramSize() to convert program size limits to memory bounds
  • Modifies both CompiledGoogleReMatcher constructors to pass RE2 options with memory limits instead of using the default RE2::Quiet mode
Comments suppressed due to low confidence (1)

source/common/common/regex.cc:121

  • The program size validation logic (error level check and warn level check) is duplicated between the two constructors. Consider extracting this logic into a shared helper function to improve maintainability and reduce the risk of inconsistencies if this logic needs to be updated in the future.
  if (do_program_size_check) {
    const uint32_t regex_program_size = static_cast<uint32_t>(regex_.ProgramSize());

    const uint32_t max_program_size_error_level = runtimeMaxProgramSizeErrorLevelOrDefault();
    if (regex_program_size > max_program_size_error_level) {
      creation_status = absl::InvalidArgumentError(
          fmt::format("regex '{}' RE2 program size of {} > max program size of "
                      "{} set for the error level threshold. Increase "
                      "configured max program size if necessary.",
                      regex, regex_program_size, max_program_size_error_level));
    }

    const uint32_t max_program_size_warn_level = runtimeMaxProgramSizeWarnLevelOrDefault();
    if (regex_program_size > max_program_size_warn_level) {
      ENVOY_LOG_MISC(warn,
                     "regex '{}' RE2 program size of {} > max program size of {} set for the warn "
                     "level threshold. Increase configured max program size if necessary.",
                     regex, regex_program_size, max_program_size_warn_level);
    }


constexpr int64_t kRe2DefaultMaxMemBytes = 8ll << 20; // 8MiB (RE2 default)
constexpr int64_t kRe2HardMaxMemBytes = 256ll << 20; // hard safety cap for compilation
constexpr int64_t kBytesPerProgramInstEstimate = 1024; // conservative bound; avoids tight coupling to RE2 internals
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number kBytesPerProgramInstEstimate (1024 bytes per instruction) is described as a "conservative bound" but lacks justification or reference. Consider adding a comment explaining how this value was determined or providing a reference to RE2 internals documentation to help future maintainers understand whether this value remains appropriate as RE2 evolves.

Suggested change
constexpr int64_t kBytesPerProgramInstEstimate = 1024; // conservative bound; avoids tight coupling to RE2 internals
// Upper bound on bytes used per compiled RE2 program instruction when mapping
// a configured program-size limit to a max_mem limit. In practice, RE2's
// internal representation (see re2::Prog and related comments in RE2) uses
// significantly less than 1024 bytes per instruction, but we deliberately
// choose a large, conservative value here so that we:
// * avoid depending on specific RE2 struct layouts or allocator behavior, and
// * err on the side of overestimating memory requirements rather than
// underestimating them.
// If RE2's program representation changes substantially, this constant may
// need to be revisited, but it should remain a safety-oriented overestimate.
constexpr int64_t kBytesPerProgramInstEstimate = 1024;

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +58
int64_t clampCompilationMaxMemFromProgramSize(uint32_t max_program_size) {
// If the caller has no meaningful ceiling, fall back to a safe hard cap.
if (max_program_size == UINT32_MAX) {
return kRe2HardMaxMemBytes;
}

// Bound compilation memory as a function of configured program-size ceilings.
const int64_t scaled =
static_cast<int64_t>(max_program_size) * kBytesPerProgramInstEstimate;

// Ensure we never go below RE2’s default, and never exceed the hard safety cap.
return std::min(kRe2HardMaxMemBytes, std::max(kRe2DefaultMaxMemBytes, scaled));
}

re2::RE2::Options makeQuietRe2OptionsWithMaxMem(int64_t max_mem_bytes) {
re2::RE2::Options options;
// Match "Quiet" behavior (no stderr logging) while allowing us to set max_mem.
options.set_log_errors(false);
options.set_max_mem(max_mem_bytes);
return options;
}
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new memory bounding functionality introduced by this PR (clampCompilationMaxMemFromProgramSize and makeQuietRe2OptionsWithMaxMem) lacks test coverage. Consider adding tests that verify the memory limits are correctly applied during regex compilation, especially edge cases like UINT32_MAX handling and the clamping behavior between min/max bounds.

Copilot uses AI. Check for mistakes.
@tyxia tyxia self-assigned this Jan 16, 2026
@tyxia
Copy link
Member

tyxia commented Jan 16, 2026

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants