feat: optimize Gemini context caching boundary#5522
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Response from ADK Triaging Agent Hello @Itish2003, thank you for creating this PR! In order to get this PR reviewed and merged, could you please:
This information will help reviewers to review your PR more efficiently. Thanks! |
|
Hi @Itish2003 , Thank you for your contribution! It appears you haven't yet signed the Contributor License Agreement (CLA). Please visit https://cla.developers.google.com/ to complete the signing process. Once the CLA is signed, we'll be able to proceed with the review of your PR. Thank you! |
This PR optimizes the
GeminiContextCacheManagerby implementing a 'last user batch' boundary strategy instead of naively caching the entire history. This cleanly isolates the stable prefix (system instructions, tools, prior conversation) from the transient suffix (the latest user message), preventing cache misses on multi-turn conversations and drastically improving token efficiency.Changes:
_find_count_of_contents_to_cacheto dynamically determine the boundary before the last continuous batch of user messages.4096token limit in favor of the cache's minimum token limits.