Add startupProbe to vLLM config templates to prevent premature pod kills by KillianGolds · Pull Request #1161 · opendatahub-io/kserve

KillianGolds · 2026-03-05T18:03:10Z

What this PR does / why we need it:

Ports the startupProbe changes from upstream kserve/kserve PR kserve#5063 to release-v0.15.

vLLM pods are consistently killed by liveness probes before model loading completes, causing 100% failure rate on test_llm_auth_enabled_requires_token in odh-model-controller e2e CI (see opendatahub-io/odh-model-controller#698). The root cause is that liveness probes (with initialDelaySeconds) begin checking before vLLM finishes loading models, and with TLS cert rotation happening during startup, the probes never pass in time.

This PR:

Adds a startupProbe to all 6 vLLM config templates, giving vLLM up to 600s (failureThreshold: 60 × periodSeconds: 10) to start before liveness/readiness probes kick in
Removes initialDelaySeconds from liveness and readiness probes on the main vLLM container (no longer needed with startupProbe gating)
Adapts the upstream change for release-v0.15 (uses scheme: HTTPS since TLS is enabled on this branch)

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Feature/Issue validation/testing:

go test ./pkg/controller/llmisvc/... -run TestPresetFiles -v - all 8 subtests pass
Verified no initialDelaySeconds remains on main vLLM containers (only sidecar containers retain theirs)
Verified startupProbe present in all 6 config templates
e2e: test_llm_auth_enabled_requires_token should pass with this change (previously failing 100% due to liveness probe killing vLLM before ready)

Special notes for your reviewer:

This is a manual port of upstream PR feat(llmisvc): add startupProbe to vLLM containers kserve/kserve#5063 - a clean cherry-pick was not possible due to branch divergence (different directory structure, API version, TLS enabled vs disabled, command structure, no Helm charts on release-v0.15)
Only the main vLLM container probes are modified. Sidecar containers (llm-d-routing-sidecar) and headless worker containers are untouched
The make precommit poetry-lock step fails due to local Python version (3.14 vs required <3.13) - this is a pre-existing environment issue unrelated to this change. All Go targets (vet, codegen, tests) pass

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?
Have you linked the JIRA issue(s) to this PR?

Release note:

Add startupProbe to vLLM containers in LLMInferenceService config templates, preventing liveness probes from killing pods during model loading. Removes initialDelaySeconds from liveness/readiness probes in favor of startupProbe gating.

Port upstream kserve/kserve PR kserve#5063 to release-v0.15. vLLM pods were being killed by liveness probes before model loading completed, causing consistent e2e test failures. The startupProbe gates liveness/readiness checks until vLLM is actually serving, allowing up to 600s for startup. Signed-off-by: Killian Golds <kgolds@redhat.com>

openshift-ci · 2026-03-05T18:03:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: KillianGolds
Once this PR has been reviewed and has the lgtm label, please assign brettmthompson for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-03-05T18:03:33Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (4)

main
master
incubation
rhoai

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 75f6f0ad-33dd-4fad-8c2d-08ecb025024d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

KillianGolds · 2026-03-06T10:43:16Z

/retest

red-hat-konflux · 2026-03-06T10:48:43Z

/group-test

jlost · 2026-03-06T14:52:51Z

/retest

red-hat-konflux · 2026-03-06T14:58:19Z

/group-test

openshift-ci · 2026-03-06T15:07:27Z

@KillianGolds: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-raw	`7becfa7`	link	true	`/test e2e-raw`
ci/prow/e2e-predictor	`7becfa7`	link	true	`/test e2e-predictor`
ci/prow/e2e-graph	`7becfa7`	link	true	`/test e2e-graph`
ci/prow/e2e-llm-inference-service	`7becfa7`	link	true	`/test e2e-llm-inference-service`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

KillianGolds requested review from mholder6 and spolti March 5, 2026 18:03

github-project-automation Bot added this to ODH Model Serving Planning Mar 5, 2026

github-project-automation Bot moved this to New/Backlog in ODH Model Serving Planning Mar 5, 2026

KillianGolds marked this pull request as draft March 6, 2026 15:34

openshift-ci Bot added the do-not-merge/work-in-progress label Mar 6, 2026

KillianGolds mentioned this pull request Mar 10, 2026

Port TLS enablement from release-v0.15 to base LLMInferenceServiceConfig templates #1179

Closed

7 tasks

pierDipi closed this Mar 16, 2026

github-project-automation Bot moved this from New/Backlog to Done in ODH Model Serving Planning Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add startupProbe to vLLM config templates to prevent premature pod kills#1161

Add startupProbe to vLLM config templates to prevent premature pod kills#1161
KillianGolds wants to merge 1 commit intoopendatahub-io:release-v0.15from
KillianGolds:feat/startup-probe-release-v0.15

KillianGolds commented Mar 5, 2026

Uh oh!

openshift-ci Bot commented Mar 5, 2026

Uh oh!

coderabbitai Bot commented Mar 5, 2026

Review skipped

Uh oh!

KillianGolds commented Mar 6, 2026

Uh oh!

red-hat-konflux Bot commented Mar 6, 2026

Uh oh!

jlost commented Mar 6, 2026

Uh oh!

red-hat-konflux Bot commented Mar 6, 2026

Uh oh!

openshift-ci Bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

KillianGolds commented Mar 5, 2026

Uh oh!

openshift-ci Bot commented Mar 5, 2026

Uh oh!

coderabbitai Bot commented Mar 5, 2026

Review skipped

Uh oh!

KillianGolds commented Mar 6, 2026

Uh oh!

red-hat-konflux Bot commented Mar 6, 2026

Uh oh!

jlost commented Mar 6, 2026

Uh oh!

red-hat-konflux Bot commented Mar 6, 2026

Uh oh!

openshift-ci Bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants