Skip to content

Approximate prefix cache scorer should incorporate the absolute match length when calculating the score #2561

@ahg-g

Description

@ahg-g

What would you like to be added:
Currently the approximate prefix cache scorer calculates the score based on the match percentage of the prompt itself. The absolute prompt length should be a factor in the score.

Why is this needed:
The current approach may not be ideal for short sequences, which will typically get a high match percentage (since the denominator is small) and so results in high prefix match score and could be more susceptible to hot spots.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions