Fix CI setup for llmisvc by andresllh · Pull Request #1196 · opendatahub-io/kserve

andresllh · 2026-03-13T16:46:54Z

Summary by CodeRabbit

Tests
- Improved test setup flow with staged resource application, webhook readiness waits, and additional logging.
- Adjusted test execution mode via test-run argument change.
- Updated fixtures to enforce non-root security context.
- Enhanced auth-related tests to auto-add a disabled-auth annotation when missing and handle LLM service config updates.

Signed-off-by: Andres Llausas <allausas@redhat.com>

coderabbitai · 2026-03-13T16:47:21Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Updates the OpenShift CI test setup script to perform per-image substitutions, apply resources in multiple stages (CRDs first, then resources that may trigger webhook validation), wait for the llmisvc-controller-manager to be ready before applying webhook-validated resources, re-apply LLMInferenceServiceConfig after webhook readiness, and conditionally patch/restart the controller when running llminferenceservice tests; adds related logging and waits. Test fixtures for LLMInferenceServiceConfig now set securityContext.runAsNonRoot: true and remove runAsUser fields. test_llm_inference_service.py now ensures metadata.annotations exist and sets security.opendatahub.io/enable-auth = "false" when annotations are missing before creating services. The CI workflow invocation arg changed from 1 to 0.

Note: automatically disabling auth in tests is a potential security concern (see CWE-306: Missing Authentication).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Fix CI setup for llmisvc' is related to the changeset but is overly vague and doesn't clearly convey the main purpose—addressing a race condition in llmisvc configuration.	Revise to a more specific title such as 'Fix race condition in llmisvc config application by sequencing webhook readiness' to better reflect the actual changes made.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.

Add a .trivyignore file to your project to customize which findings Trivy reports.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/scripts/openshift-ci/setup-e2e-tests.sh`:
- Around line 136-138: The current unconditional suppression of errors after the
oc apply call hides real failures; change the oc apply invocation in the setup
script (the line using oc apply --server-side=true --force-conflicts -f -) to
capture its exit code and output, log any failure to stderr (e.g., echo or
printf with the captured output and exit code), and only swallow/continue for
expected webhook/validation errors by checking the output text for
webhook/validation-related messages before allowing the script to proceed; for
any other failures exit non-zero so problems like auth, malformed YAML, missing
CRDs, quotas, or network errors are surfaced.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 238f208f-076c-44b1-a5b9-7ae58cb3645f

📥 Commits

Reviewing files that changed from the base of the PR and between ae20ef2 and 79a08ae.

📒 Files selected for processing (1)

test/scripts/openshift-ci/setup-e2e-tests.sh

…ce-condition-llmisvc-tests-master

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

test/scripts/openshift-ci/setup-e2e-tests.sh (1)

136-138: ⚠️ Potential issue | 🟠 Major

Do not suppress all oc apply failures with || true.

Line 138 masks unrelated failures (auth, malformed manifests, missing APIs, network) and makes CI results non-diagnostic. Restrict continuation to expected webhook/validation failures only.

Proposed fix

-  echo "$ODH_MANIFESTS" | oc apply --server-side=true --force-conflicts -f - || true
+  APPLY_OUTPUT="$(echo "$ODH_MANIFESTS" | oc apply --server-side=true --force-conflicts -f - 2>&1)" || APPLY_RC=$?
+  APPLY_RC="${APPLY_RC:-0}"
+  if [[ "$APPLY_RC" -ne 0 ]]; then
+    if grep -qiE 'webhook|admission|denied the request|validation' <<<"$APPLY_OUTPUT"; then
+      echo "⚠️ Initial apply failed due to expected webhook/validation timing. Retrying after controller readiness."
+    else
+      echo "$APPLY_OUTPUT" >&2
+      exit "$APPLY_RC"
+    fi
+  fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/scripts/openshift-ci/setup-e2e-tests.sh` around lines 136 - 138, The
current line piping "$ODH_MANIFESTS" into oc apply ends with "|| true", which
masks all failures; change this to run oc apply without unconditional
suppression, capture its stderr/exit code, and only ignore/ retry when the
failure matches the expected webhook/validation error (e.g., contains
"validation webhook" or the specific LLMInferenceServiceConfig webhook message).
Concretely: run oc apply --server-side=true --force-conflicts -f - and if it
fails, inspect the output for the known webhook/validation error text and retry
a bounded number of times; for any other error (auth, malformed manifest,
missing API, network), propagate the non-zero exit so CI fails and logs the real
error. Use the existing ODH_MANIFESTS variable and the oc apply command as the
location to implement this conditional retry/exit behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/default/manager_image_patch.yaml`:
- Line 11: Replace the unqualified mutable image reference "image:
kserve-controller" with a fully qualified, pinned image (registry hostname,
repository, and immutable tag or digest) so the manager image is deterministic;
update the value in config/default/manager_image_patch.yaml where "image:
kserve-controller" appears and ensure it matches the image naming convention
used by your kustomization (or add an images: transform in
config/default/kustomization.yaml) so deployments use the exact
registry/repository:tag or `@sha256`:digest.

In `@test/e2e/llmisvc/test_resources.py`:
- Line 31: Tests hardcode the gatewayClassName value ("openshift-default")
instead of using the existing env-driven GATEWAY_CLASS_NAME, breaking
non-OpenShift runs; replace the literal "openshift-default" in the JSON/dict
entries (both occurrences) with the GATEWAY_CLASS_NAME variable so the test
reads the environment-driven value, ensuring GATEWAY_CLASS_NAME is in scope
where gatewayClassName is constructed and updating both places referenced in
test_resources.py.

In `@test/scripts/openshift-ci/setup-e2e-tests.sh`:
- Line 150: The shell commands calling oc (e.g., the oc patch configmap
inferenceservice-config invocation) pass the KSERVE_NAMESPACE variable unquoted;
update those invocations to quote the namespace variable as
"${KSERVE_NAMESPACE}" wherever it’s used (both occurrences flagged) to prevent
word-splitting/globbing and command injection risks.

---

Duplicate comments:
In `@test/scripts/openshift-ci/setup-e2e-tests.sh`:
- Around line 136-138: The current line piping "$ODH_MANIFESTS" into oc apply
ends with "|| true", which masks all failures; change this to run oc apply
without unconditional suppression, capture its stderr/exit code, and only
ignore/ retry when the failure matches the expected webhook/validation error
(e.g., contains "validation webhook" or the specific LLMInferenceServiceConfig
webhook message). Concretely: run oc apply --server-side=true --force-conflicts
-f - and if it fails, inspect the output for the known webhook/validation error
text and retry a bounded number of times; for any other error (auth, malformed
manifest, missing API, network), propagate the non-zero exit so CI fails and
logs the real error. Use the existing ODH_MANIFESTS variable and the oc apply
command as the location to implement this conditional retry/exit behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 198d4ce8-9310-43bb-afcf-bd60b92dfa17

📥 Commits

Reviewing files that changed from the base of the PR and between 79a08ae and 94bf125.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum, !**/*.sum

📒 Files selected for processing (6)

config/default/manager_image_patch.yaml
pkg/controller/v1alpha2/llmisvc/workload_storage.go
pkg/utils/storage.go
test/e2e/llmisvc/fixtures.py
test/e2e/llmisvc/test_resources.py
test/scripts/openshift-ci/setup-e2e-tests.sh

Signed-off-by: Andres Llausas <allausas@redhat.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/llmisvc/test_llm_inference_service.py`:
- Around line 381-385: The guard only checks if
test_case.llm_service.metadata.annotations is falsy, missing the case where
annotations exist but the key "security.opendatahub.io/enable-auth" is absent;
update the logic around test_case.llm_service.metadata.annotations to ensure
annotations is a dict (create if None) and then explicitly set
test_case.llm_service.metadata.annotations["security.opendatahub.io/enable-auth"]
= "false" only when that specific key is not present, using the existing symbols
test_case.llm_service.metadata.annotations and the annotation key to locate and
modify the code.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: a28b493b-467c-4f54-906b-8e508dea39e8

📥 Commits

Reviewing files that changed from the base of the PR and between 94bf125 and a471d38.

📒 Files selected for processing (4)

config/overlays/odh-test/configmap/inferenceservice.yaml
test/e2e/llmisvc/fixtures.py
test/e2e/llmisvc/test_llm_inference_service.py
test/scripts/openshift-ci/setup-e2e-tests.sh

🚧 Files skipped from review as they are similar to previous changes (1)

test/e2e/llmisvc/fixtures.py

Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

Switch to `oc` Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

pierDipi · 2026-03-20T18:22:44Z

⏳ waiting for authorino-operator to be ready.…
error: timed out waiting for the condition on kuadrants/kuadrant

/test e2e-llm-inference-service

pierDipi · 2026-03-20T18:55:44Z

/test e2e-graph

bartoszmajsak · 2026-03-20T20:23:16Z

/retest-required

pierDipi · 2026-03-21T07:35:03Z

FAILED llmisvc/test_storage_version_migration.py::TestStorageVersionMigration::test_storage_version_migration_after_simulated_upgrade - subprocess.CalledProcessError: Command '['oc', 'rollout', 'restart', 'deployment/llmisvc-controller-manager', '-n', 'opendatahub']' returned non-zero exit status 1.

controller is running in Kserve namespace

  name: llmisvc-controller-manager-5d5d6dffd5-6l4p5
  namespace: kserve

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

Fix error `no matches for kind \"ClusterServingRuntime\" in version \"serving.kserve.io/v1alpha1\"` Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

pierDipi · 2026-03-23T07:08:54Z

The isvc tests failures can be addressed separately

/override ci/prow/e2e-raw
/override ci/prow/e2e-predictor
/override ci/prow/e2e-graph

openshift-ci · 2026-03-23T07:09:05Z

@pierDipi: Overrode contexts on behalf of pierDipi: ci/prow/e2e-graph, ci/prow/e2e-predictor, ci/prow/e2e-raw

Details

In response to this:

The isvc tests failures can be addressed separately

/override ci/prow/e2e-raw
/override ci/prow/e2e-predictor
/override ci/prow/e2e-graph

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

pierDipi

/lgtm
/approve

openshift-ci · 2026-03-23T07:09:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andresllh, pierDipi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [andresllh,pierDipi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-cherrypick-robot · 2026-03-23T07:11:43Z

@pierDipi: #1196 failed to apply on top of branch "release-v0.17":

Applying: Fixing race condition for llmisvcconfigs
Applying: Rebasing midstream/master onto my branch
Using index info to reconstruct a base tree...
M	go.sum
M	pkg/controller/v1alpha2/llmisvc/workload_storage.go
M	pkg/utils/storage.go
M	test/e2e/llmisvc/fixtures.py
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/llmisvc/fixtures.py
Auto-merging pkg/utils/storage.go
Auto-merging pkg/controller/v1alpha2/llmisvc/workload_storage.go
Auto-merging go.sum
CONFLICT (content): Merge conflict in go.sum
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0002 Rebasing midstream/master onto my branch

Details

In response to this:

/cherry-pick release-v0.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Fixing race condition for llmisvcconfigs

79a08ae

Signed-off-by: Andres Llausas <allausas@redhat.com>

github-project-automation bot added this to ODH Model Serving Planning Mar 13, 2026

github-project-automation bot moved this to New/Backlog in ODH Model Serving Planning Mar 13, 2026

openshift-ci bot added the approved label Mar 13, 2026

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

Comment thread test/scripts/openshift-ci/setup-e2e-tests.sh

andresllh added 2 commits March 16, 2026 12:18

Merge branch 'master' of github.com:opendatahub-io/kserve into fix-ra…

c846903

…ce-condition-llmisvc-tests-master

Rebasing midstream/master onto my branch

94bf125

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

Comment thread config/default/manager_image_patch.yaml Outdated

Comment thread test/e2e/llmisvc/test_resources.py Outdated

Comment thread test/scripts/openshift-ci/setup-e2e-tests.sh Outdated

pierDipi reviewed Mar 16, 2026

View reviewed changes

Comment thread pkg/controller/v1alpha2/llmisvc/workload_storage.go Outdated

pierDipi reviewed Mar 16, 2026

View reviewed changes

Comment thread pkg/utils/storage.go Outdated

Most tests passing now, still need to fix a couple errors

a471d38

Signed-off-by: Andres Llausas <allausas@redhat.com>

andresllh force-pushed the fix-race-condition-llmisvc-tests-master branch from 8a9813b to a471d38 Compare March 16, 2026 20:12

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

Comment thread test/e2e/llmisvc/test_llm_inference_service.py Outdated

pierDipi reviewed Mar 16, 2026

View reviewed changes

Comment thread go.sum

pierDipi reviewed Mar 17, 2026

View reviewed changes

Comment thread pkg/controller/v1alpha2/llmisvc/workload_storage.go Outdated

Apply suggestion from @pierDipi

f504a26

Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>

openshift-merge-robot added the needs-rebase label Mar 17, 2026

Apply suggestion from @pierDipi

5f58011

Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>

pierDipi reviewed Mar 17, 2026

View reviewed changes

Comment thread test/e2e/llmisvc/test_resources.py Outdated

Apply suggestion from @pierDipi

901b8ac

Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>

pierDipi reviewed Mar 17, 2026

View reviewed changes

Comment thread pkg/controller/v1alpha2/llmisvc/workload_storage.go Outdated

Apply suggestion from @pierDipi

29f25b4

Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>

pierDipi reviewed Mar 17, 2026

View reviewed changes

Comment thread test/e2e/llmisvc/test_resources.py Outdated

pierDipi and others added 2 commits March 17, 2026 10:00

Apply suggestion from @pierDipi

5a1a20b

Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>

Merge branch 'odh-master' into fix-race-condition-llmisvc-tests-master

34869f1

openshift-merge-robot removed the needs-rebase label Mar 17, 2026

pierDipi added 2 commits March 17, 2026 10:05

Update annotation injection logic

495f874

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

Handle all cases for image replacement

9a24b67

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

pierDipi mentioned this pull request Mar 17, 2026

[kserve] Add v0.17 CI and llmisvc tests on master openshift/release#76351

Merged

bartoszmajsak mentioned this pull request Mar 19, 2026

fix(llmisvc): defer migration until webhooks are serving #1251

Merged

Image is not getting replaced

146b83d

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

pierDipi force-pushed the fix-race-condition-llmisvc-tests-master branch from 6792d67 to 146b83d Compare March 20, 2026 06:57

bartoszmajsak mentioned this pull request Mar 20, 2026

fix(ci): inject KSERVE_NAMESPACE into E2E test environment #1253

Merged

4 tasks

pierDipi force-pushed the fix-race-condition-llmisvc-tests-master branch from c932e29 to 5444a04 Compare March 20, 2026 08:17

fix: executable file cat not found in $PATH: No such file or directory

d07db8d

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

pierDipi force-pushed the fix-race-condition-llmisvc-tests-master branch from 5444a04 to d07db8d Compare March 20, 2026 09:21

pierDipi added 3 commits March 20, 2026 17:18

fix: HOME permissions

ffc0c00

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

Merge branch 'odh-master' into fix-race-condition-llmisvc-tests-master

76eecca

fix: kubectl command doesn't exist in CI

df866f7

Switch to `oc` Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

pierDipi force-pushed the fix-race-condition-llmisvc-tests-master branch from d881c16 to df866f7 Compare March 20, 2026 17:20

opendatahub -> kserve

c2bd34e

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

pierDipi force-pushed the fix-race-condition-llmisvc-tests-master branch from a966ca2 to c2bd34e Compare March 21, 2026 07:37

pierDipi added 2 commits March 21, 2026 14:33

fix: re-introduce cluster storage container

0f8351c

Fix error `no matches for kind \"ClusterServingRuntime\" in version \"serving.kserve.io/v1alpha1\"` Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

fix CA race condition flake

76d118f

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

pierDipi approved these changes Mar 23, 2026

View reviewed changes

openshift-ci bot added the lgtm label Mar 23, 2026

pierDipi merged commit 1ac3660 into opendatahub-io:master Mar 23, 2026
35 of 37 checks passed

github-project-automation bot moved this from New/Backlog to Done in ODH Model Serving Planning Mar 23, 2026

pierDipi mentioned this pull request Mar 23, 2026

fix: storage initializer permissions error on downloading from HF red-hat-data-services/kserve#4279

Merged

andresllh deleted the fix-race-condition-llmisvc-tests-master branch March 23, 2026 13:52

bartoszmajsak mentioned this pull request Mar 25, 2026

Merge odh-master to release-v0.17 #1267

Closed

4 tasks

Conversation

andresllh commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pierDipi commented Mar 20, 2026

Uh oh!

pierDipi commented Mar 20, 2026

Uh oh!

bartoszmajsak commented Mar 20, 2026

Uh oh!

pierDipi commented Mar 21, 2026

Uh oh!

pierDipi commented Mar 23, 2026

Uh oh!

openshift-ci bot commented Mar 23, 2026

Uh oh!

pierDipi left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Mar 23, 2026

Uh oh!

Uh oh!

openshift-cherrypick-robot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andresllh commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading