Add startupProbe to vLLM config templates to prevent premature pod kills#1161
Conversation
Port upstream kserve/kserve PR kserve#5063 to release-v0.15. vLLM pods were being killed by liveness probes before model loading completed, causing consistent e2e test failures. The startupProbe gates liveness/readiness checks until vLLM is actually serving, allowing up to 600s for startup. Signed-off-by: Killian Golds <kgolds@redhat.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: KillianGolds The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (4)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Central YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/retest |
|
/group-test |
|
/retest |
|
/group-test |
|
@KillianGolds: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What this PR does / why we need it:
Ports the startupProbe changes from upstream kserve/kserve PR kserve#5063 to release-v0.15.
vLLM pods are consistently killed by liveness probes before model loading completes, causing 100% failure rate on
test_llm_auth_enabled_requires_tokenin odh-model-controller e2e CI (see opendatahub-io/odh-model-controller#698). The root cause is that liveness probes (withinitialDelaySeconds) begin checking before vLLM finishes loading models, and with TLS cert rotation happening during startup, the probes never pass in time.This PR:
startupProbeto all 6 vLLM config templates, giving vLLM up to 600s (failureThreshold: 60 × periodSeconds: 10) to start before liveness/readiness probes kick ininitialDelaySecondsfrom liveness and readiness probes on the main vLLM container (no longer needed with startupProbe gating)scheme: HTTPSsince TLS is enabled on this branch)Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
Feature/Issue validation/testing:
go test ./pkg/controller/llmisvc/... -run TestPresetFiles -v- all 8 subtests passinitialDelaySecondsremains on main vLLM containers (only sidecar containers retain theirs)startupProbepresent in all 6 config templatestest_llm_auth_enabled_requires_tokenshould pass with this change (previously failing 100% due to liveness probe killing vLLM before ready)Special notes for your reviewer:
llm-d-routing-sidecar) and headless worker containers are untouchedmake precommitpoetry-lock step fails due to local Python version (3.14 vs required <3.13) - this is a pre-existing environment issue unrelated to this change. All Go targets (vet, codegen, tests) passChecklist:
Release note: