Skip to content

Feature Request: Lazy / deferred variable initialization (property-like mechanism) #178

@virusdefender

Description

@virusdefender

Problem

There is often a mismatch between the data format wirefilter expects and the raw data available in the application. This requires conversion, and both the conversion process and the allocation/deallocation of the transformed data incur overhead.

struct Request<'a> {
    url: &'a [u8],
    // raw header data
    header: HashMap<&'a [u8], Vec<&'a [u8]>>,
    // wirefilter header data
    wirefilter_header: TypedMap<'a, TypedArray<'a, Bytes<'a>>>,
}

Although ast.uses() can filter out variables that are not referenced by the expression, it does not address the issue of short-circuit evaluation. For instance:

url matches "/uncommon_url" && any(header["foo"][*] matches "bar")

Here, the header variable appears in the expression and therefore will be included by ast.uses(), but it may never actually be evaluated (because the first condition short-circuits). In this case, the conversion for header is wasted effort.

Workarounds (and their limitations)

One idea is to replace the variable with a function, so expressions could use get_header()["foo"][*] instead of header["foo"][*]. This would defer the conversion until the function is actually called. Unfortunately, wirefilter functions currently can only return a fixed value and are unable to pull data from context or user_data to construct a result dynamically, making this approach infeasible.

struct Request<'a> {
    url: &'a [u8],
    header: HashMap<&'a [u8], Vec<&'a [u8]>>,
    wirefilter_header: OnceLock<TypedMap<'a, TypedArray<'a, Bytes<'a>>>>,
}

impl<'a> Request<'a> {
    fn get_wirefilter_header(&self) -> &TypedMap<'a, TypedArray<'a, Bytes<'a>>> {
        self.wirefilter_header.get_or_init(|| {
            todo!()
        })
    }
}

Proposed Solution

A more comprehensive approach would be to add something akin to Python's property mechanism: a variable that looks like an ordinary field from the expression's perspective, but is actually backed by a function that can pull the necessary context at evaluation time to construct its return value on the fly.


I also noticed a hacky attempt at solving this problem here: Stealthiumio/wirefilter@9d66964


Cloudflare's internal use cases are fairly similar to the example above, I suspect you've encountered the same kind of overhead and would love to hear how you approach this optimization internally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions