Fix CI setup for llmisvc#1196
Conversation
Signed-off-by: Andres Llausas <allausas@redhat.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughUpdates the OpenShift CI test setup script to perform per-image substitutions, apply resources in multiple stages (CRDs first, then resources that may trigger webhook validation), wait for the llmisvc-controller-manager to be ready before applying webhook-validated resources, re-apply LLMInferenceServiceConfig after webhook readiness, and conditionally patch/restart the controller when running llminferenceservice tests; adds related logging and waits. Test fixtures for LLMInferenceServiceConfig now set securityContext.runAsNonRoot: true and remove runAsUser fields. test_llm_inference_service.py now ensures metadata.annotations exist and sets security.opendatahub.io/enable-auth = "false" when annotations are missing before creating services. The CI workflow invocation arg changed from 1 to 0. Note: automatically disabling auth in tests is a potential security concern (see CWE-306: Missing Authentication). Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.Add a .trivyignore file to your project to customize which findings Trivy reports. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/scripts/openshift-ci/setup-e2e-tests.sh`:
- Around line 136-138: The current unconditional suppression of errors after the
oc apply call hides real failures; change the oc apply invocation in the setup
script (the line using oc apply --server-side=true --force-conflicts -f -) to
capture its exit code and output, log any failure to stderr (e.g., echo or
printf with the captured output and exit code), and only swallow/continue for
expected webhook/validation errors by checking the output text for
webhook/validation-related messages before allowing the script to proceed; for
any other failures exit non-zero so problems like auth, malformed YAML, missing
CRDs, quotas, or network errors are surfaced.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 238f208f-076c-44b1-a5b9-7ae58cb3645f
📒 Files selected for processing (1)
test/scripts/openshift-ci/setup-e2e-tests.sh
…ce-condition-llmisvc-tests-master
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
test/scripts/openshift-ci/setup-e2e-tests.sh (1)
136-138:⚠️ Potential issue | 🟠 MajorDo not suppress all
oc applyfailures with|| true.Line 138 masks unrelated failures (auth, malformed manifests, missing APIs, network) and makes CI results non-diagnostic. Restrict continuation to expected webhook/validation failures only.
Proposed fix
- echo "$ODH_MANIFESTS" | oc apply --server-side=true --force-conflicts -f - || true + APPLY_OUTPUT="$(echo "$ODH_MANIFESTS" | oc apply --server-side=true --force-conflicts -f - 2>&1)" || APPLY_RC=$? + APPLY_RC="${APPLY_RC:-0}" + if [[ "$APPLY_RC" -ne 0 ]]; then + if grep -qiE 'webhook|admission|denied the request|validation' <<<"$APPLY_OUTPUT"; then + echo "⚠️ Initial apply failed due to expected webhook/validation timing. Retrying after controller readiness." + else + echo "$APPLY_OUTPUT" >&2 + exit "$APPLY_RC" + fi + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/scripts/openshift-ci/setup-e2e-tests.sh` around lines 136 - 138, The current line piping "$ODH_MANIFESTS" into oc apply ends with "|| true", which masks all failures; change this to run oc apply without unconditional suppression, capture its stderr/exit code, and only ignore/ retry when the failure matches the expected webhook/validation error (e.g., contains "validation webhook" or the specific LLMInferenceServiceConfig webhook message). Concretely: run oc apply --server-side=true --force-conflicts -f - and if it fails, inspect the output for the known webhook/validation error text and retry a bounded number of times; for any other error (auth, malformed manifest, missing API, network), propagate the non-zero exit so CI fails and logs the real error. Use the existing ODH_MANIFESTS variable and the oc apply command as the location to implement this conditional retry/exit behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/default/manager_image_patch.yaml`:
- Line 11: Replace the unqualified mutable image reference "image:
kserve-controller" with a fully qualified, pinned image (registry hostname,
repository, and immutable tag or digest) so the manager image is deterministic;
update the value in config/default/manager_image_patch.yaml where "image:
kserve-controller" appears and ensure it matches the image naming convention
used by your kustomization (or add an images: transform in
config/default/kustomization.yaml) so deployments use the exact
registry/repository:tag or `@sha256`:digest.
In `@test/e2e/llmisvc/test_resources.py`:
- Line 31: Tests hardcode the gatewayClassName value ("openshift-default")
instead of using the existing env-driven GATEWAY_CLASS_NAME, breaking
non-OpenShift runs; replace the literal "openshift-default" in the JSON/dict
entries (both occurrences) with the GATEWAY_CLASS_NAME variable so the test
reads the environment-driven value, ensuring GATEWAY_CLASS_NAME is in scope
where gatewayClassName is constructed and updating both places referenced in
test_resources.py.
In `@test/scripts/openshift-ci/setup-e2e-tests.sh`:
- Line 150: The shell commands calling oc (e.g., the oc patch configmap
inferenceservice-config invocation) pass the KSERVE_NAMESPACE variable unquoted;
update those invocations to quote the namespace variable as
"${KSERVE_NAMESPACE}" wherever it’s used (both occurrences flagged) to prevent
word-splitting/globbing and command injection risks.
---
Duplicate comments:
In `@test/scripts/openshift-ci/setup-e2e-tests.sh`:
- Around line 136-138: The current line piping "$ODH_MANIFESTS" into oc apply
ends with "|| true", which masks all failures; change this to run oc apply
without unconditional suppression, capture its stderr/exit code, and only
ignore/ retry when the failure matches the expected webhook/validation error
(e.g., contains "validation webhook" or the specific LLMInferenceServiceConfig
webhook message). Concretely: run oc apply --server-side=true --force-conflicts
-f - and if it fails, inspect the output for the known webhook/validation error
text and retry a bounded number of times; for any other error (auth, malformed
manifest, missing API, network), propagate the non-zero exit so CI fails and
logs the real error. Use the existing ODH_MANIFESTS variable and the oc apply
command as the location to implement this conditional retry/exit behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 198d4ce8-9310-43bb-afcf-bd60b92dfa17
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum,!**/*.sum
📒 Files selected for processing (6)
config/default/manager_image_patch.yamlpkg/controller/v1alpha2/llmisvc/workload_storage.gopkg/utils/storage.gotest/e2e/llmisvc/fixtures.pytest/e2e/llmisvc/test_resources.pytest/scripts/openshift-ci/setup-e2e-tests.sh
Signed-off-by: Andres Llausas <allausas@redhat.com>
8a9813b to
a471d38
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/llmisvc/test_llm_inference_service.py`:
- Around line 381-385: The guard only checks if
test_case.llm_service.metadata.annotations is falsy, missing the case where
annotations exist but the key "security.opendatahub.io/enable-auth" is absent;
update the logic around test_case.llm_service.metadata.annotations to ensure
annotations is a dict (create if None) and then explicitly set
test_case.llm_service.metadata.annotations["security.opendatahub.io/enable-auth"]
= "false" only when that specific key is not present, using the existing symbols
test_case.llm_service.metadata.annotations and the annotation key to locate and
modify the code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: a28b493b-467c-4f54-906b-8e508dea39e8
📒 Files selected for processing (4)
config/overlays/odh-test/configmap/inferenceservice.yamltest/e2e/llmisvc/fixtures.pytest/e2e/llmisvc/test_llm_inference_service.pytest/scripts/openshift-ci/setup-e2e-tests.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- test/e2e/llmisvc/fixtures.py
Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>
Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>
Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>
Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>
Signed-off-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
6792d67 to
146b83d
Compare
c932e29 to
5444a04
Compare
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
5444a04 to
d07db8d
Compare
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Switch to `oc` Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
d881c16 to
df866f7
Compare
/test e2e-llm-inference-service |
|
/test e2e-graph |
|
/retest-required |
controller is running in Kserve namespace |
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
a966ca2 to
c2bd34e
Compare
Fix error `no matches for kind \"ClusterServingRuntime\" in version \"serving.kserve.io/v1alpha1\"` Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
|
The isvc tests failures can be addressed separately /override ci/prow/e2e-raw |
|
@pierDipi: Overrode contexts on behalf of pierDipi: ci/prow/e2e-graph, ci/prow/e2e-predictor, ci/prow/e2e-raw DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andresllh, pierDipi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@pierDipi: #1196 failed to apply on top of branch "release-v0.17": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary by CodeRabbit