Problem
There is often a mismatch between the data format wirefilter expects and the raw data available in the application. This requires conversion, and both the conversion process and the allocation/deallocation of the transformed data incur overhead.
struct Request<'a> {
url: &'a [u8],
// raw header data
header: HashMap<&'a [u8], Vec<&'a [u8]>>,
// wirefilter header data
wirefilter_header: TypedMap<'a, TypedArray<'a, Bytes<'a>>>,
}
Although ast.uses() can filter out variables that are not referenced by the expression, it does not address the issue of short-circuit evaluation. For instance:
url matches "/uncommon_url" && any(header["foo"][*] matches "bar")
Here, the header variable appears in the expression and therefore will be included by ast.uses(), but it may never actually be evaluated (because the first condition short-circuits). In this case, the conversion for header is wasted effort.
Workarounds (and their limitations)
One idea is to replace the variable with a function, so expressions could use get_header()["foo"][*] instead of header["foo"][*]. This would defer the conversion until the function is actually called. Unfortunately, wirefilter functions currently can only return a fixed value and are unable to pull data from context or user_data to construct a result dynamically, making this approach infeasible.
struct Request<'a> {
url: &'a [u8],
header: HashMap<&'a [u8], Vec<&'a [u8]>>,
wirefilter_header: OnceLock<TypedMap<'a, TypedArray<'a, Bytes<'a>>>>,
}
impl<'a> Request<'a> {
fn get_wirefilter_header(&self) -> &TypedMap<'a, TypedArray<'a, Bytes<'a>>> {
self.wirefilter_header.get_or_init(|| {
todo!()
})
}
}
Proposed Solution
A more comprehensive approach would be to add something akin to Python's property mechanism: a variable that looks like an ordinary field from the expression's perspective, but is actually backed by a function that can pull the necessary context at evaluation time to construct its return value on the fly.
I also noticed a hacky attempt at solving this problem here: Stealthiumio/wirefilter@9d66964
Cloudflare's internal use cases are fairly similar to the example above, I suspect you've encountered the same kind of overhead and would love to hear how you approach this optimization internally.
Problem
There is often a mismatch between the data format wirefilter expects and the raw data available in the application. This requires conversion, and both the conversion process and the allocation/deallocation of the transformed data incur overhead.
Although
ast.uses()can filter out variables that are not referenced by the expression, it does not address the issue of short-circuit evaluation. For instance:Here, the
headervariable appears in the expression and therefore will be included byast.uses(), but it may never actually be evaluated (because the first condition short-circuits). In this case, the conversion forheaderis wasted effort.Workarounds (and their limitations)
One idea is to replace the variable with a function, so expressions could use
get_header()["foo"][*]instead ofheader["foo"][*]. This would defer the conversion until the function is actually called. Unfortunately, wirefilter functions currently can only return a fixed value and are unable to pull data from context oruser_datato construct a result dynamically, making this approach infeasible.Proposed Solution
A more comprehensive approach would be to add something akin to Python's property mechanism: a variable that looks like an ordinary field from the expression's perspective, but is actually backed by a function that can pull the necessary context at evaluation time to construct its return value on the fly.
I also noticed a hacky attempt at solving this problem here: Stealthiumio/wirefilter@9d66964
Cloudflare's internal use cases are fairly similar to the example above, I suspect you've encountered the same kind of overhead and would love to hear how you approach this optimization internally.