-
Notifications
You must be signed in to change notification settings - Fork 5.2k
regex: bound RE2 compilation memory during matcher initialization #42924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
regex: bound RE2 compilation memory during matcher initialization #42924
Conversation
|
Hi @Naveed8951, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
|
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds RE2 compilation memory bounds to prevent excessive resource usage during regex compilation. The implementation derives memory limits from existing Envoy max program size configurations, establishing explicit RE2::Options with bounded max_mem values before regex compilation.
Changes:
- Introduces helper functions to retrieve runtime max program size settings with safe defaults
- Adds
clampCompilationMaxMemFromProgramSize()to convert program size limits to memory bounds - Modifies both
CompiledGoogleReMatcherconstructors to pass RE2 options with memory limits instead of using the defaultRE2::Quietmode
Comments suppressed due to low confidence (1)
source/common/common/regex.cc:121
- The program size validation logic (error level check and warn level check) is duplicated between the two constructors. Consider extracting this logic into a shared helper function to improve maintainability and reduce the risk of inconsistencies if this logic needs to be updated in the future.
if (do_program_size_check) {
const uint32_t regex_program_size = static_cast<uint32_t>(regex_.ProgramSize());
const uint32_t max_program_size_error_level = runtimeMaxProgramSizeErrorLevelOrDefault();
if (regex_program_size > max_program_size_error_level) {
creation_status = absl::InvalidArgumentError(
fmt::format("regex '{}' RE2 program size of {} > max program size of "
"{} set for the error level threshold. Increase "
"configured max program size if necessary.",
regex, regex_program_size, max_program_size_error_level));
}
const uint32_t max_program_size_warn_level = runtimeMaxProgramSizeWarnLevelOrDefault();
if (regex_program_size > max_program_size_warn_level) {
ENVOY_LOG_MISC(warn,
"regex '{}' RE2 program size of {} > max program size of {} set for the warn "
"level threshold. Increase configured max program size if necessary.",
regex, regex_program_size, max_program_size_warn_level);
}
|
|
||
| constexpr int64_t kRe2DefaultMaxMemBytes = 8ll << 20; // 8MiB (RE2 default) | ||
| constexpr int64_t kRe2HardMaxMemBytes = 256ll << 20; // hard safety cap for compilation | ||
| constexpr int64_t kBytesPerProgramInstEstimate = 1024; // conservative bound; avoids tight coupling to RE2 internals |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number kBytesPerProgramInstEstimate (1024 bytes per instruction) is described as a "conservative bound" but lacks justification or reference. Consider adding a comment explaining how this value was determined or providing a reference to RE2 internals documentation to help future maintainers understand whether this value remains appropriate as RE2 evolves.
| constexpr int64_t kBytesPerProgramInstEstimate = 1024; // conservative bound; avoids tight coupling to RE2 internals | |
| // Upper bound on bytes used per compiled RE2 program instruction when mapping | |
| // a configured program-size limit to a max_mem limit. In practice, RE2's | |
| // internal representation (see re2::Prog and related comments in RE2) uses | |
| // significantly less than 1024 bytes per instruction, but we deliberately | |
| // choose a large, conservative value here so that we: | |
| // * avoid depending on specific RE2 struct layouts or allocator behavior, and | |
| // * err on the side of overestimating memory requirements rather than | |
| // underestimating them. | |
| // If RE2's program representation changes substantially, this constant may | |
| // need to be revisited, but it should remain a safety-oriented overestimate. | |
| constexpr int64_t kBytesPerProgramInstEstimate = 1024; |
| int64_t clampCompilationMaxMemFromProgramSize(uint32_t max_program_size) { | ||
| // If the caller has no meaningful ceiling, fall back to a safe hard cap. | ||
| if (max_program_size == UINT32_MAX) { | ||
| return kRe2HardMaxMemBytes; | ||
| } | ||
|
|
||
| // Bound compilation memory as a function of configured program-size ceilings. | ||
| const int64_t scaled = | ||
| static_cast<int64_t>(max_program_size) * kBytesPerProgramInstEstimate; | ||
|
|
||
| // Ensure we never go below RE2’s default, and never exceed the hard safety cap. | ||
| return std::min(kRe2HardMaxMemBytes, std::max(kRe2DefaultMaxMemBytes, scaled)); | ||
| } | ||
|
|
||
| re2::RE2::Options makeQuietRe2OptionsWithMaxMem(int64_t max_mem_bytes) { | ||
| re2::RE2::Options options; | ||
| // Match "Quiet" behavior (no stderr logging) while allowing us to set max_mem. | ||
| options.set_log_errors(false); | ||
| options.set_max_mem(max_mem_bytes); | ||
| return options; | ||
| } |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new memory bounding functionality introduced by this PR (clampCompilationMaxMemFromProgramSize and makeQuietRe2OptionsWithMaxMem) lacks test coverage. Consider adding tests that verify the memory limits are correctly applied during regex compilation, especially edge cases like UINT32_MAX handling and the clamping behavior between min/max bounds.
|
/retest |
Commit Message:
regex: bound RE2 compilation memory during matcher initialization
Additional Description:
Bound RE2 compilation memory up-front using explicit
re2::RE2::Optionsderivedfrom existing Envoy max program size limits, preventing excessive resource usage
during regex compilation.
Risk Level:
Medium
Testing:
source/common/common/regex.ccobserving bounded compilation behavior
Docs Changes:
None
Release Notes:
None
Platform Specific Features:
None