Skip to content

RSPEED-2943: add auth monitoring metrics#1638

Closed
major wants to merge 3 commits intolightspeed-core:mainfrom
major:split/auth-metrics
Closed

RSPEED-2943: add auth monitoring metrics#1638
major wants to merge 3 commits intolightspeed-core:mainfrom
major:split/auth-metrics

Conversation

@major
Copy link
Copy Markdown
Contributor

@major major commented Apr 29, 2026

Stack: 1/3 - Merge this first, then #1639, then #1640.

Description

Adds bounded Prometheus metrics for authentication attempts and latency across the configured auth dependencies. This keeps auth SLO dashboards consistent across noop, token, JWK, Kubernetes, API key, and rh-identity modes without adding user identifiers as labels.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

  • Assisted-by: OpenCode
  • Generated by: N/A

Related Tickets & Documents

  • Related Issue # RSPEED-2943
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • `uv run pytest tests/unit/metrics/test_recording.py tests/unit/authentication/test_utils.py tests/unit/authentication/test_noop.py tests/unit/authentication/test_noop_with_token.py tests/unit/authentication/test_api_key_token.py tests/unit/authentication/test_jwk_token.py tests/unit/authentication/test_k8s.py`
  • `uv run radon cc -s src/authentication/api_key_token.py src/authentication/jwk_token.py src/authentication/k8s.py src/authentication/noop.py src/authentication/noop_with_token.py src/authentication/rh_identity.py src/authentication/utils.py src/metrics/recording.py`
  • `uv run make verify`

Summary by CodeRabbit

  • New Features

    • Added per-request authentication metrics and timing (attempt/result/reason + duration) with shared metric buckets; auth flows now emit bounded outcome/reason labels.
    • No-op-with-token path now returns an additional skip-userid flag and records metrics for empty user IDs and token extraction categories.
    • RH Identity and Kubernetes auth now normalize probe/metrics skip behavior and report structured auth/authorization outcomes and errors.
  • Tests

    • Expanded unit tests for auth flows and metrics: recording, label normalization, error resilience, and new failure-path assertions.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Review Change Stack

Warning

Rate limit exceeded

@major has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 49 minutes and 29 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 05a34c6e-4a18-43b8-8124-faa9bc4e7c52

📥 Commits

Reviewing files that changed from the base of the PR and between 7916763 and 0685826.

📒 Files selected for processing (24)
  • src/app/endpoints/responses.py
  • src/app/endpoints/responses_telemetry.py
  • src/authentication/api_key_token.py
  • src/authentication/jwk_token.py
  • src/authentication/k8s.py
  • src/authentication/noop.py
  • src/authentication/noop_with_token.py
  • src/authentication/rh_identity.py
  • src/authentication/utils.py
  • src/metrics/__init__.py
  • src/metrics/recording.py
  • src/observability/__init__.py
  • src/observability/splunk.py
  • tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml
  • tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml
  • tests/e2e-prow/rhoai/scripts/e2e-ops.sh
  • tests/e2e/features/environment.py
  • tests/unit/app/endpoints/test_responses_splunk.py
  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/authentication/test_rh_identity.py
  • tests/unit/authentication/test_utils.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/observability/test_splunk.py

Walkthrough

Adds per-request monotonic timing and centralized authentication metrics across auth dependencies, new Prometheus auth metrics and safe recording utilities, refactors JWK/K8S auth flows into helpers, and updates tests to assert bounded metric labels and durable error handling.

Changes

Auth metrics & instrumentation

Cohort / File(s) Summary
Metrics Core
src/metrics/__init__.py
Adds AUTH_DURATION_BUCKETS, auth_attempts_total counter and auth_duration_seconds histogram; updates llm histogram typing.
Metrics Recording
src/metrics/recording.py
Adds bounded auth label constants, normalization helpers, and safe recording: record_auth_attempt, record_auth_duration.
Auth Utilities
src/authentication/utils.py
Adds record_auth_metrics(auth_module, result, reason, start_time) that delegates to recording helpers and logs warnings on failures.
JWK Auth
src/authentication/jwk_token.py
Refactors JWK auth into helpers (_get_jwk_set_for_auth, _decode_jwk_claims, _validate_jwk_claims, _get_required_claim), instruments per-request timing, and records detailed metric labels for failure/success paths.
API Key Auth
src/authentication/api_key_token.py
Instruments API-key auth with monotonic timing and records metrics for missing header/token, invalid key, and success.
Kubernetes Auth
src/authentication/k8s.py
Adds timing and record_auth_metrics across K8s flow; refactors UID resolution and SubjectAccessReview creation into helpers that centralize error classification and ensure metrics are recorded before raising HTTP errors; get_user_info now raises on unexpected TokenReview errors.
RH Identity Auth
src/authentication/rh_identity.py
Adds timing and canonicalized probe/metrics skip helper; records detailed failure reasons (missing/oversized/invalid base64/JSON/non-object), classifies entitlement vs identity errors, and records success on authentication.
No-op Auth
src/authentication/noop.py
Adds timing and metrics; records failure/empty_user_id on missing user_id and skipped/no_auth_required on success.
No-op-with-token Auth
src/authentication/noop_with_token.py
Return value updated to a 4-tuple (user_id, DEFAULT_USER_NAME, skip_userid_check, user_token); adds timing and metrics for missing_token, empty_user_id, and no_auth_required.
Auth Tests
tests/unit/authentication/*
Adds/updates tests asserting metric recording on success and failure paths, verifies record_auth_metrics swallows metric errors and logs warnings, and expands RH Identity probe/metrics path parametrizations and nested-payload validation coverage.
Metrics Tests
tests/unit/metrics/test_recording.py
Adds tests for record_auth_attempt and record_auth_duration covering label normalization, successful updates, and exception-safe logging behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Auth as AuthDependency
    participant Ext as External (JWK/K8S/RH)
    participant Metrics as Metrics Recorder
    Client->>Auth: Request with token/header
    Auth->>Auth: start_time = time.monotonic()
    Auth->>Ext: Validate token / Fetch JWK or TokenReview or parse identity
    Ext-->>Auth: Validation result / JWK / Error
    alt validation success
        Auth->>Metrics: record_auth_metrics(module, "success", "authenticated", start_time)
        Auth-->>Client: Authenticated (user_id, ...)
    else validation failure
        Auth->>Metrics: record_auth_metrics(module, "failure", "<reason>", start_time)
        Auth-->>Client: HTTPException (mapped status & cause)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: adding authentication monitoring metrics across multiple authentication modules.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@major major mentioned this pull request Apr 29, 2026
19 tasks
@major major force-pushed the split/auth-metrics branch from e54eae1 to 046a1a5 Compare April 30, 2026 03:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/authentication/test_utils.py (1)

47-75: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add a failure-path test for record_auth_metrics exception swallowing.

Line 47 and Line 62 cover happy paths only; please also assert that an exception from one metric backend call is swallowed and logged, so auth flow stays unaffected.

Suggested test addition
+def test_record_auth_metrics_swallows_metric_errors(mocker: MockerFixture) -> None:
+    """Test helper does not propagate metric backend failures."""
+    mocker.patch(
+        "authentication.utils.recording.record_auth_attempt",
+        side_effect=ValueError("boom"),
+    )
+    mock_logger = mocker.patch("authentication.utils.logger")
+
+    record_auth_metrics("jwk-token", "failure", "missing_token", 10.0)
+
+    mock_logger.warning.assert_called_once_with(
+        "Failed to record authentication metrics for module %s with result %s",
+        "jwk-token",
+        "failure",
+        exc_info=True,
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/authentication/test_utils.py` around lines 47 - 75, Add a test
that verifies record_auth_metrics swallows exceptions from metric backends by
patching authentication.utils.recording.record_auth_attempt (or
record_auth_duration) to raise an exception and patching the module logger to
assert logger.exception is called; call record_auth_metrics("jwk-token",
"success", "authenticated", <start_time>) and assert the other metric function
(record_auth_duration or record_auth_attempt) was still invoked with the
expected args and that no exception propagates out of record_auth_metrics.
Reference the helper function record_auth_metrics and the backend functions
record_auth_attempt and record_auth_duration, and assert logger.exception (or
logger.error) was invoked to confirm the error was logged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/authentication/api_key_token.py`:
- Around line 67-73: The current except block for extract_user_token collapses
all HTTPException cases into a single "missing_token" reason; change it to catch
HTTPException as e and derive a specific metric reason from e (for example
inspect e.status_code or e.detail to map to "missing_token", "malformed_header",
"invalid_scheme", etc.), then call record_auth_metrics(AUTH_MOD_APIKEY_TOKEN,
"failure", derived_reason, start_time) before re-raising the exception so
telemetry reflects the actual extraction failure; reference extract_user_token,
record_auth_metrics and AUTH_MOD_APIKEY_TOKEN when making this change.

In `@src/authentication/k8s.py`:
- Around line 559-574: The code treats any None return from get_user_info() as
an invalid/expired token and emits a 401, but get_user_info() can return None on
internal/token-review failures; change the handling so a None result is
classified as a token-review/internal error instead of an "invalid_token": call
record_auth_metrics(AUTH_MOD_K8S, "failure", "token_review_error", start_time)
and return or raise an appropriate server error (e.g., 500/502) instead of
constructing UnauthorizedResponse; keep extract_user_token, get_user_info,
record_auth_metrics and UnauthorizedResponse references to locate and update the
branch that currently maps user_info is None to "invalid_token".

In `@src/authentication/rh_identity.py`:
- Around line 381-394: The code assumes decoded JSON is a dict and passes it
straight into RHIdentityData, which can raise TypeError for non-object payloads
(null, number, list) before the HTTPException handler runs; before constructing
RHIdentityData in this block, explicitly validate that identity_data is a
mapping (use isinstance(identity_data, dict)) and that nested keys like identity
(and user if expected) are dicts; if validation fails, call
_record_rh_identity_auth("failure", "invalid_identity", start_time) and raise an
HTTPException with a non-403 status (e.g., HTTPException(status_code=400)) so
the existing except HTTPException branch handles it and records the metric, then
proceed to construct RHIdentityData and call validate_entitlements as before.

In `@src/metrics/__init__.py`:
- Around line 77-89: Annotate the module-level metric constants with Final per
guidelines: update auth_attempts_total and auth_duration_seconds to have Final
type hints (e.g., Final[Counter] and Final[Histogram]) and ensure typing.Final
is imported (from typing import Final) so these constants are clearly declared
immutable while keeping their current assignments and label/bucket arguments
intact.

In `@src/metrics/recording.py`:
- Around line 31-58: ALLOWED_AUTH_REASONS currently omits the RH Identity
failure labels so the per-reason metrics collapse to "unknown"; update the
ALLOWED_AUTH_REASONS frozenset in src/metrics/recording.py to include the four
new keys recorded by src/authentication/rh_identity.py: "header_too_large",
"invalid_base64", "invalid_identity", and "entitlement_missing" (add them
alongside the existing entries in the ALLOWED_AUTH_REASONS declaration so the
labels from rh_identity.py are preserved in metrics).

---

Outside diff comments:
In `@tests/unit/authentication/test_utils.py`:
- Around line 47-75: Add a test that verifies record_auth_metrics swallows
exceptions from metric backends by patching
authentication.utils.recording.record_auth_attempt (or record_auth_duration) to
raise an exception and patching the module logger to assert logger.exception is
called; call record_auth_metrics("jwk-token", "success", "authenticated",
<start_time>) and assert the other metric function (record_auth_duration or
record_auth_attempt) was still invoked with the expected args and that no
exception propagates out of record_auth_metrics. Reference the helper function
record_auth_metrics and the backend functions record_auth_attempt and
record_auth_duration, and assert logger.exception (or logger.error) was invoked
to confirm the error was logged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ebf76500-ba75-4275-857b-3e54468c5606

📥 Commits

Reviewing files that changed from the base of the PR and between ca125c4 and 046a1a5.

📒 Files selected for processing (14)
  • src/authentication/api_key_token.py
  • src/authentication/jwk_token.py
  • src/authentication/k8s.py
  • src/authentication/noop.py
  • src/authentication/noop_with_token.py
  • src/authentication/rh_identity.py
  • src/authentication/utils.py
  • src/metrics/__init__.py
  • src/metrics/recording.py
  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/authentication/test_rh_identity.py
  • tests/unit/authentication/test_utils.py
  • tests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: Pylinter
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

  • src/authentication/utils.py
  • src/authentication/api_key_token.py
  • src/authentication/noop.py
  • src/metrics/recording.py
  • src/authentication/noop_with_token.py
  • src/metrics/__init__.py
  • src/authentication/k8s.py
  • src/authentication/rh_identity.py
  • src/authentication/jwk_token.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

  • tests/unit/authentication/test_utils.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/authentication/test_rh_identity.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

  • tests/unit/authentication/test_utils.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/authentication/test_rh_identity.py
src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files contain brief package descriptions

Files:

  • src/metrics/__init__.py
🧠 Learnings (5)
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests

Applied to files:

  • tests/unit/authentication/test_utils.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • src/authentication/noop.py
  • src/authentication/noop_with_token.py
  • tests/unit/authentication/test_rh_identity.py
📚 Learning: 2026-04-13T13:34:51.052Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:377-381
Timestamp: 2026-04-13T13:34:51.052Z
Learning: In `tests/e2e/features/environment.py` (lightspeed-core/lightspeed-stack), the hardcoded token `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva` is intentionally a dummy/truncated JWT (the canonical jwt.io documentation example, missing its signature segment). It is not a real credential and does not need to be replaced or loaded from an environment variable. Only a scanner-suppression comment may be warranted if CI noise becomes an issue.

Applied to files:

  • tests/unit/authentication/test_jwk_token.py
📚 Learning: 2026-04-27T23:16:37.085Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1606
File: src/authentication/k8s.py:444-447
Timestamp: 2026-04-27T23:16:37.085Z
Learning: In `src/authentication/k8s.py`, the metrics skip check (`skip_for_metrics`) intentionally uses `request.url.path in ("/metrics",)` (exact match via `in`) to stay consistent with the adjacent health probe skip pattern `request.url.path in ("/readiness", "/liveness")`. This differs from `rh_identity.py` which uses `endswith("/metrics")`, but the intra-file consistency was a deliberate style choice by the author.

Applied to files:

  • tests/unit/authentication/test_rh_identity.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

  • tests/unit/authentication/test_rh_identity.py
  • src/authentication/rh_identity.py
🔇 Additional comments (5)
tests/unit/authentication/test_jwk_token.py (1)

507-527: Nice coverage for error telemetry and exception-chain behavior.

These additions validate both the unexpected-error metrics path and preserved root-cause chaining for invalid required claims.

Also applies to: 594-610

src/authentication/utils.py (1)

44-66: Shared auth metric helper is implemented safely.

Good call on isolating metric write failures from request handling.

tests/unit/authentication/test_k8s.py (1)

1141-1201: Good parametrized coverage for API error mapping + metric emission.

This closes an important observability/error-handling gap for _create_subject_access_review.

tests/unit/metrics/test_recording.py (1)

165-247: Auth metric recording test coverage is solid.

Nice balance across success, normalization, and logger-backed failure paths.

src/authentication/noop.py (1)

54-67: No-op auth instrumentation is clean and behavior-preserving.

The new failure/skipped metric emission paths are correctly placed.

Comment thread src/authentication/api_key_token.py
Comment thread src/authentication/k8s.py
Comment thread src/authentication/rh_identity.py
Comment thread src/metrics/__init__.py Outdated
Comment thread src/metrics/recording.py
@major major force-pushed the split/auth-metrics branch from 046a1a5 to 0388b99 Compare April 30, 2026 21:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/authentication/rh_identity.py (1)

380-406: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The new identity guard still misses nested non-dict payloads.

Line 381 only validates the top-level JSON object. Payloads like {"identity": 1} or {"identity": {"type": "User", "user": 1}} still raise TypeError inside RHIdentityData, bypass this except HTTPException, and return a 500 without the invalid_identity metric.

Proposed fix
-        if not isinstance(identity_data, dict):
+        if (
+            not isinstance(identity_data, dict)
+            or not isinstance(identity_data.get("identity"), dict)
+            or (
+                identity_data["identity"].get("type") == "User"
+                and not isinstance(identity_data["identity"].get("user"), dict)
+            )
+            or (
+                identity_data["identity"].get("type") == "System"
+                and not isinstance(identity_data["identity"].get("system"), dict)
+            )
+        ):
             logger.warning(
-                "x-rh-identity decoded to non-dict type: %s",
+                "x-rh-identity decoded to invalid structure: %s",
                 type(identity_data).__name__,
             )
             _record_rh_identity_auth("failure", "invalid_identity", start_time)
             raise HTTPException(
                 status_code=400,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/authentication/rh_identity.py` around lines 380 - 406, The top-level
guard only checks identity_data is a dict but misses nested non-dict shapes that
cause TypeError inside RHIdentityData and surface as 500s; before constructing
RHIdentityData (or immediately after), validate that
identity_data.get("identity") is a dict and that nested expected fields (e.g.,
"type", "user", "account") are the correct types, and if not call
_record_rh_identity_auth("failure","invalid_identity", start_time) and raise
HTTPException(400, detail="Invalid identity data in x-rh-identity header");
alternatively, wrap the RHIdentityData(...) and
rh_identity.validate_entitlements() calls in a broader except Exception as e
that converts TypeError/ValueError into an HTTPException with the above metric
recording so nested shape errors do not produce 500s.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/authentication/k8s.py`:
- Around line 431-493: The call to K8sClientSingleton.get_authz_api() can raise
during client init and is currently caught by the broad Exception handler
(mislabeling it as an unexpected error); wrap the get_authz_api() call in its
own try/except that catches ApiException (and generic client-init exceptions if
needed) and handle them like Kubernetes API outages: call
record_auth_metrics(AUTH_MOD_K8S, "failure", "authorization_check_error",
start_time), log the error, construct a ServiceUnavailableResponse
(backend_name="Kubernetes API", cause including e.reason or str(e)) and raise
HTTPException(**response.model_dump()) so client-init failures are classified as
service unavailable rather than unexpected_error, leaving the subsequent
SubjectAccessReview creation/error handling unchanged.

---

Duplicate comments:
In `@src/authentication/rh_identity.py`:
- Around line 380-406: The top-level guard only checks identity_data is a dict
but misses nested non-dict shapes that cause TypeError inside RHIdentityData and
surface as 500s; before constructing RHIdentityData (or immediately after),
validate that identity_data.get("identity") is a dict and that nested expected
fields (e.g., "type", "user", "account") are the correct types, and if not call
_record_rh_identity_auth("failure","invalid_identity", start_time) and raise
HTTPException(400, detail="Invalid identity data in x-rh-identity header");
alternatively, wrap the RHIdentityData(...) and
rh_identity.validate_entitlements() calls in a broader except Exception as e
that converts TypeError/ValueError into an HTTPException with the above metric
recording so nested shape errors do not produce 500s.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c1eab94e-5401-4574-a556-c43d5bb15b54

📥 Commits

Reviewing files that changed from the base of the PR and between 046a1a5 and 0388b99.

📒 Files selected for processing (14)
  • src/authentication/api_key_token.py
  • src/authentication/jwk_token.py
  • src/authentication/k8s.py
  • src/authentication/noop.py
  • src/authentication/noop_with_token.py
  • src/authentication/rh_identity.py
  • src/authentication/utils.py
  • src/metrics/__init__.py
  • src/metrics/recording.py
  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/authentication/test_rh_identity.py
  • tests/unit/authentication/test_utils.py
  • tests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: spectral
  • GitHub Check: Pylinter
  • GitHub Check: build-pr
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

  • src/authentication/utils.py
  • src/authentication/noop.py
  • src/authentication/api_key_token.py
  • src/metrics/__init__.py
  • src/authentication/noop_with_token.py
  • src/authentication/jwk_token.py
  • src/authentication/k8s.py
  • src/authentication/rh_identity.py
  • src/metrics/recording.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

  • tests/unit/authentication/test_utils.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/authentication/test_rh_identity.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

  • tests/unit/authentication/test_utils.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/authentication/test_rh_identity.py
src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files contain brief package descriptions

Files:

  • src/metrics/__init__.py
🧠 Learnings (7)
📚 Learning: 2026-04-13T13:34:51.052Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:377-381
Timestamp: 2026-04-13T13:34:51.052Z
Learning: In `tests/e2e/features/environment.py` (lightspeed-core/lightspeed-stack), the hardcoded token `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva` is intentionally a dummy/truncated JWT (the canonical jwt.io documentation example, missing its signature segment). It is not a real credential and does not need to be replaced or loaded from an environment variable. Only a scanner-suppression comment may be warranted if CI noise becomes an issue.

Applied to files:

  • tests/unit/authentication/test_jwk_token.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • src/authentication/noop.py
  • tests/unit/authentication/test_rh_identity.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/**/*.py : Use Final[type] as type hint for all constants

Applied to files:

  • src/metrics/__init__.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/constants.py : Central `constants.py` for shared constants with descriptive comments

Applied to files:

  • src/metrics/__init__.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Check `constants.py` for shared constants before defining new ones

Applied to files:

  • src/metrics/__init__.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

  • src/authentication/rh_identity.py
  • tests/unit/authentication/test_rh_identity.py
  • src/metrics/recording.py
📚 Learning: 2026-04-27T23:16:37.085Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1606
File: src/authentication/k8s.py:444-447
Timestamp: 2026-04-27T23:16:37.085Z
Learning: In `src/authentication/k8s.py`, the metrics skip check (`skip_for_metrics`) intentionally uses `request.url.path in ("/metrics",)` (exact match via `in`) to stay consistent with the adjacent health probe skip pattern `request.url.path in ("/readiness", "/liveness")`. This differs from `rh_identity.py` which uses `endswith("/metrics")`, but the intra-file consistency was a deliberate style choice by the author.

Applied to files:

  • tests/unit/authentication/test_rh_identity.py
🔇 Additional comments (10)
tests/unit/authentication/test_jwk_token.py (2)

507-527: Strong coverage for unexpected-error metrics in JWK auth path.

This test cleanly verifies the failure metric payload and preserves the original exception flow.


594-610: Exception-chain assertion is a valuable regression guard.

Verifying __cause__ here protects root-cause visibility for invalid-claim failures.

src/authentication/utils.py (1)

44-66: record_auth_metrics implementation is robust and aligned with non-disruptive telemetry goals.

The helper correctly keeps auth execution safe even if metric recording fails.

tests/unit/authentication/test_utils.py (1)

47-92: Good focused unit coverage for the shared auth-metrics helper.

The added cases validate both duration math and graceful degradation on recorder failures.

src/authentication/noop.py (1)

54-67: No-op auth metrics instrumentation is well-placed and behavior-preserving.

Failure and skipped outcomes are both captured with consistent timing context.

tests/unit/metrics/test_recording.py (1)

221-296: Nice expansion of bounded-label and failure-path coverage for auth metric recorders.

These tests provide solid guardrails for normalization and logging behavior.

tests/unit/authentication/test_k8s.py (1)

1141-1201: Good direct coverage of SubjectAccessReview error mapping plus metric emission.

The new parametrized assertions strengthen confidence in K8s failure classification.

src/authentication/api_key_token.py (2)

67-77: Great fix for differentiating token extraction failure reasons in metrics.

This improves auth telemetry fidelity without altering exception behavior.


83-92: Success/failure metric hooks around API-key validation look correct.

The placement ensures invalid-key and valid-key outcomes are both consistently timed and recorded.

tests/unit/authentication/test_rh_identity.py (1)

507-640: Path-parametrization updates are well aligned with normalized skip-path logic.

Including trailing-slash and non-skip variants improves regression safety for probe/metrics bypass behavior.

Comment thread src/authentication/k8s.py
@major major force-pushed the split/auth-metrics branch from 0388b99 to 2e84669 Compare May 1, 2026 11:36
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
src/authentication/rh_identity.py (1)

414-427: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard nested container types before constructing RHIdentityData.

The new top-level dict check still lets payloads like {"identity": 1} or {"identity": {"type": "User", "user": []}} raise TypeError inside RHIdentityData. Those bypass this except HTTPException branch, return 500, and skip the invalid_identity metric.

Suggested fix
         try:
+            identity = identity_data.get("identity")
+            if not isinstance(identity, dict):
+                raise HTTPException(
+                    status_code=400,
+                    detail="Invalid identity data in x-rh-identity header",
+                )
+            if identity.get("type") == "User" and not isinstance(
+                identity.get("user"), dict
+            ):
+                raise HTTPException(
+                    status_code=400,
+                    detail="Invalid identity data in x-rh-identity header",
+                )
+            if identity.get("type") == "System" and not isinstance(
+                identity.get("system"), dict
+            ):
+                raise HTTPException(
+                    status_code=400,
+                    detail="Invalid identity data in x-rh-identity header",
+                )
             rh_identity = RHIdentityData(
                 identity_data,
                 required_entitlements=self.required_entitlements,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/authentication/rh_identity.py` around lines 414 - 427, Before
constructing RHIdentityData, explicitly validate that identity_data is a dict
and that identity_data.get("identity") is also a dict (and any nested container
fields RHIdentityData expects, e.g., "user" or "org", are dicts when present);
if these checks fail, call _record_rh_identity_auth("failure",
"invalid_identity", start_time) and raise an HTTPException (non-403 status) so
the existing except HTTPException branch runs; update the block around
RHIdentityData(...) and rh_identity.validate_entitlements() to perform these
isinstance checks and short-circuit with the metric + HTTPException on
type-mismatch to avoid internal TypeError and 500s.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/authentication/jwk_token.py`:
- Around line 216-223: The validator currently only rejects empty strings but
accepts whitespace-only claims; update the check in the function handling claim
validation (the block that calls record_auth_metrics/AUTH_MOD_JWK_TOKEN and
raises UnauthorizedResponse/HTTPException) to treat strings that are all
whitespace as invalid by testing e.g. value.strip() before accepting it, so that
whitespace-only values trigger the same invalid_claim metric and raise the same
UnauthorizedResponse/HTTPException as empty strings.

In `@src/authentication/noop_with_token.py`:
- Around line 72-78: The catch block for extract_user_token(request.headers)
should preserve the exception's detail instead of always using "missing_token":
catch the HTTPException as e in the try/except around extract_user_token, derive
a failure_reason from e.detail (or str(e) if detail missing) and pass that into
record_auth_metrics(AUTH_MOD_NOOP_WITH_TOKEN, "failure", failure_reason,
start_time), then re-raise the original exception so the header-vs-malformed
distinction is retained; reference extract_user_token, record_auth_metrics,
AUTH_MOD_NOOP_WITH_TOKEN, request.headers, and start_time when making the
change.

In `@src/authentication/utils.py`:
- Around line 44-66: Update the public function record_auth_metrics docstring to
follow Google Python style: replace the "Args:" section with a "Parameters"
section listing auth_module (str), result (str), reason (str), and start_time
(float) with concise descriptions, add a "Returns" section documenting that it
returns None, and optionally include a "Raises" note only if the function may
raise exceptions; keep the existing overall one-sentence summary and the
parameter descriptions aligned with the rest of the repo's docstrings for
consistency.

---

Duplicate comments:
In `@src/authentication/rh_identity.py`:
- Around line 414-427: Before constructing RHIdentityData, explicitly validate
that identity_data is a dict and that identity_data.get("identity") is also a
dict (and any nested container fields RHIdentityData expects, e.g., "user" or
"org", are dicts when present); if these checks fail, call
_record_rh_identity_auth("failure", "invalid_identity", start_time) and raise an
HTTPException (non-403 status) so the existing except HTTPException branch runs;
update the block around RHIdentityData(...) and
rh_identity.validate_entitlements() to perform these isinstance checks and
short-circuit with the metric + HTTPException on type-mismatch to avoid internal
TypeError and 500s.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6f170dd6-9029-4db1-8967-bf72610f16a2

📥 Commits

Reviewing files that changed from the base of the PR and between 0388b99 and 2e84669.

📒 Files selected for processing (14)
  • src/authentication/api_key_token.py
  • src/authentication/jwk_token.py
  • src/authentication/k8s.py
  • src/authentication/noop.py
  • src/authentication/noop_with_token.py
  • src/authentication/rh_identity.py
  • src/authentication/utils.py
  • src/metrics/__init__.py
  • src/metrics/recording.py
  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/authentication/test_rh_identity.py
  • tests/unit/authentication/test_utils.py
  • tests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: Pylinter
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

  • src/authentication/utils.py
  • src/authentication/api_key_token.py
  • src/authentication/noop.py
  • src/authentication/noop_with_token.py
  • src/metrics/__init__.py
  • src/authentication/jwk_token.py
  • src/metrics/recording.py
  • src/authentication/rh_identity.py
  • src/authentication/k8s.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/authentication/test_utils.py
  • tests/unit/authentication/test_rh_identity.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/authentication/test_utils.py
  • tests/unit/authentication/test_rh_identity.py
src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files contain brief package descriptions

Files:

  • src/metrics/__init__.py
🧠 Learnings (6)
📚 Learning: 2026-04-13T13:34:51.052Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:377-381
Timestamp: 2026-04-13T13:34:51.052Z
Learning: In `tests/e2e/features/environment.py` (lightspeed-core/lightspeed-stack), the hardcoded token `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva` is intentionally a dummy/truncated JWT (the canonical jwt.io documentation example, missing its signature segment). It is not a real credential and does not need to be replaced or loaded from an environment variable. Only a scanner-suppression comment may be warranted if CI noise becomes an issue.

Applied to files:

  • tests/unit/authentication/test_jwk_token.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/**/*.py : Use Final[type] as type hint for all constants

Applied to files:

  • src/metrics/__init__.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

  • src/metrics/recording.py
  • src/authentication/rh_identity.py
  • tests/unit/authentication/test_rh_identity.py
📚 Learning: 2026-04-27T23:16:37.085Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1606
File: src/authentication/k8s.py:444-447
Timestamp: 2026-04-27T23:16:37.085Z
Learning: In `src/authentication/k8s.py`, the metrics skip check (`skip_for_metrics`) intentionally uses `request.url.path in ("/metrics",)` (exact match via `in`) to stay consistent with the adjacent health probe skip pattern `request.url.path in ("/readiness", "/liveness")`. This differs from `rh_identity.py` which uses `endswith("/metrics")`, but the intra-file consistency was a deliberate style choice by the author.

Applied to files:

  • tests/unit/authentication/test_rh_identity.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/unit/authentication/test_rh_identity.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling

Applied to files:

  • src/authentication/k8s.py
🔇 Additional comments (11)
tests/unit/authentication/test_jwk_token.py (1)

16-16: Good addition: the new auth-module constant and tests line up with the shared metrics contract.

The import keeps the assertions on the canonical module label, and the new cases cover both the unexpected extraction failure metric and the invalid-user_id exception chain.

Also applies to: 507-610

src/authentication/noop.py (1)

3-67: Looks good.

The monotonic timer and record_auth_metrics calls preserve the existing 400/return behavior while adding the requested observability.

tests/unit/authentication/test_utils.py (1)

6-91: Solid coverage for the shared metrics helper.

These tests validate the success/failure paths and the swallowed-error behavior without overfitting to the internal metric implementation.

src/authentication/api_key_token.py (1)

64-91: Nice split of extraction vs validation failures.

The new metrics calls preserve the existing HTTPExceptions while giving the auth telemetry enough detail to separate missing headers, malformed tokens, and invalid API keys.

tests/unit/metrics/test_recording.py (1)

221-296: Good coverage for the new auth metric helpers.

The added cases exercise both bounded-label normalization and the swallowed-error behavior that the auth wrappers depend on.

tests/unit/authentication/test_k8s.py (1)

11-24: Good addition of direct SubjectAccessReview coverage.

The new import and tests pin the authz error mapping and the init-failure metric recording without changing the production flow.

Also applies to: 1141-1223

src/metrics/__init__.py (1)

98-111: Nice bounded metric surface.

The new auth counter and histogram keep the label set small and line up with the recording helpers’ (auth_module, result[, reason]) shape, which should keep the auth dashboards stable across backends.

tests/unit/authentication/test_rh_identity.py (1)

507-645: Good skip-path regression coverage.

Covering both canonical and trailing-slash probe/metrics routes, plus nearby non-skip routes, matches the normalized-path behavior and protects against false positives.

src/authentication/k8s.py (1)

557-610: Nice separation of K8S auth outcomes.

The updated flow keeps skip, token-review, kube:admin, SAR, authorization, and success paths recording one bounded reason each, which makes the new auth metrics much easier to trust.

src/authentication/rh_identity.py (1)

47-63: Nice path normalization helper.

Canonicalizing the request path before exact comparisons lets /readiness/ and /metrics/ share the same skip behavior without widening the match beyond those endpoints.

src/metrics/recording.py (1)

24-84: The normalization layer looks solid.

Centralizing allowed auth results and reasons here keeps call sites simple while still bounding label cardinality before values reach Prometheus.

Comment thread src/authentication/jwk_token.py
Comment thread src/authentication/jwk_token.py Outdated
Comment thread src/authentication/noop_with_token.py
Comment thread src/authentication/utils.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/authentication/jwk_token.py`:
- Around line 290-296: The catch-all except for
extract_user_token(request.headers) currently records every failure as
"missing_token"; change the handler to capture the HTTPException (e.g., except
HTTPException as exc) and derive a meaningful reason from it (use exc.detail or
str(exc) or map exc.status_code to a reason) when calling
record_auth_metrics(AUTH_MOD_JWK_TOKEN, "failure", reason, start_time), then
re-raise the original exception; update the block around extract_user_token and
the record_auth_metrics invocation so malformed-scheme and missing-bearer are
recorded distinctly instead of always "missing_token".

In `@src/authentication/k8s.py`:
- Around line 575-579: The K8s path always records extract_user_token failures
as "missing_token" losing the original reason; change the except block to
capture the exception (except HTTPException as e), extract a preserved reason
from e.detail (or e.description) if present, fall back to "missing_token"
otherwise, then call record_auth_metrics(AUTH_MOD_K8S, "failure", reason,
start_time) before re-raising; update references to extract_user_token,
record_auth_metrics and AUTH_MOD_K8S accordingly so the original
"missing_header" reason is preserved.

In `@src/authentication/rh_identity.py`:
- Around line 432-445: The code calls
RHIdentityData(...).validate_entitlements() but malformed entitlements (e.g.
entitlements=[], or entries that are not mappings) can raise
AttributeError/TypeError and escape the HTTPException handler, causing 500s and
missing metrics; wrap or augment validation so type/shape errors are converted
into an HTTPException before they escape. Specifically, in the
RHIdentityData.validate_entitlements() call (or just before it) detect
non-mapping entitlements and non-mapping service entries and raise an
HTTPException (use the same 403 entitlement_missing semantics used in the except
block) so _record_rh_identity_auth("failure", "entitlement_missing", start_time)
runs and the raised HTTPException is propagated instead of an unhandled
AttributeError/TypeError.
- Around line 375-379: The skip-path logic should run unconditionally so
readiness/metrics bypass works even when an x-rh-identity header exists or is
malformed: call _get_auth_skip_tuple(request, start_time) immediately and if it
returns a non-None tuple return it before attempting to read or validate
identity_header; update the block around
identity_header/request.headers.get("x-rh-identity") and the auth_skip handling
so _get_auth_skip_tuple is invoked first and short-circuits processing when
appropriate.

In `@src/authentication/utils.py`:
- Around line 60-71: The current try block wraps both
recording.record_auth_attempt(...) and recording.record_auth_duration(...), so a
failure in record_auth_attempt prevents record_auth_duration from running;
change this by calling recording.record_auth_attempt(auth_module, result,
reason) inside its own try/except that logs via logger.warning (preserve
exc_info=True and the same message format), then separately call
recording.record_auth_duration(auth_module, result, time.monotonic() -
start_time) inside a second try/except with the same logging behavior; keep the
same parameter names (auth_module, result, reason, start_time) and message text
for consistency.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6c591d00-e3af-4078-8cb1-1f1ee5f6f8e4

📥 Commits

Reviewing files that changed from the base of the PR and between 2e84669 and 7916763.

📒 Files selected for processing (14)
  • src/authentication/api_key_token.py
  • src/authentication/jwk_token.py
  • src/authentication/k8s.py
  • src/authentication/noop.py
  • src/authentication/noop_with_token.py
  • src/authentication/rh_identity.py
  • src/authentication/utils.py
  • src/metrics/__init__.py
  • src/metrics/recording.py
  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/authentication/test_k8s.py
  • tests/unit/authentication/test_rh_identity.py
  • tests/unit/authentication/test_utils.py
  • tests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/authentication/api_key_token.py
  • src/metrics/__init__.py
  • src/authentication/utils.py
  • src/authentication/k8s.py
  • src/authentication/jwk_token.py
  • src/metrics/recording.py
  • src/authentication/noop.py
  • src/authentication/noop_with_token.py
  • src/authentication/rh_identity.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/authentication/test_jwk_token.py
  • tests/unit/authentication/test_rh_identity.py
  • tests/unit/authentication/test_utils.py
  • tests/unit/metrics/test_recording.py
  • tests/unit/authentication/test_k8s.py
src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files must contain brief package descriptions

Files:

  • src/metrics/__init__.py

Comment thread src/authentication/jwk_token.py
Comment thread src/authentication/k8s.py
Comment on lines +575 to +579
try:
token = extract_user_token(request.headers)
except HTTPException:
record_auth_metrics(AUTH_MOD_K8S, "failure", "missing_token", start_time)
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve missing_header in the K8s extraction path too.

This branch records every extract_user_token() failure as missing_token, so requests with no Authorization header lose the same distinction you already added in api_key_token.py. That makes the new per-reason auth dashboards inconsistent across auth modes.

Suggested change
-        try:
-            token = extract_user_token(request.headers)
-        except HTTPException:
-            record_auth_metrics(AUTH_MOD_K8S, "failure", "missing_token", start_time)
-            raise
+        try:
+            token = extract_user_token(request.headers)
+        except HTTPException as exc:
+            reason = "missing_token"
+            if isinstance(exc.detail, dict) and "No Authorization header" in exc.detail.get(
+                "cause", ""
+            ):
+                reason = "missing_header"
+            record_auth_metrics(AUTH_MOD_K8S, "failure", reason, start_time)
+            raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
token = extract_user_token(request.headers)
except HTTPException:
record_auth_metrics(AUTH_MOD_K8S, "failure", "missing_token", start_time)
raise
try:
token = extract_user_token(request.headers)
except HTTPException as exc:
reason = "missing_token"
if isinstance(exc.detail, dict) and "No Authorization header" in exc.detail.get(
"cause", ""
):
reason = "missing_header"
record_auth_metrics(AUTH_MOD_K8S, "failure", reason, start_time)
raise
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/authentication/k8s.py` around lines 575 - 579, The K8s path always
records extract_user_token failures as "missing_token" losing the original
reason; change the except block to capture the exception (except HTTPException
as e), extract a preserved reason from e.detail (or e.description) if present,
fall back to "missing_token" otherwise, then call
record_auth_metrics(AUTH_MOD_K8S, "failure", reason, start_time) before
re-raising; update references to extract_user_token, record_auth_metrics and
AUTH_MOD_K8S accordingly so the original "missing_header" reason is preserved.

Comment thread src/authentication/rh_identity.py
Comment on lines +432 to +445
try:
rh_identity = RHIdentityData(
identity_data,
required_entitlements=self.required_entitlements,
)

# Validate entitlements if configured
rh_identity.validate_entitlements()
# Validate entitlements if configured
rh_identity.validate_entitlements()
except HTTPException as exc:
reason = (
"entitlement_missing" if exc.status_code == 403 else "invalid_identity"
)
_record_rh_identity_auth("failure", reason, start_time)
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate malformed entitlements payloads before they escape as 500s.

validate_entitlements() still assumes entitlements and each service entry are mappings. Payloads like "entitlements": [] or "entitlements": {"rhel": 1} will raise AttributeError here, bypass this except HTTPException, and miss the new failure metric.

Suggested fix
 def has_entitlement(self, service: str) -> bool:
     """Check if user has a specific service entitlement."""
     entitlements = self.identity_data.get("entitlements", {})
+    if not isinstance(entitlements, dict):
+        logger.warning(
+            "Identity validation failed: 'entitlements' is %s, expected dict",
+            type(entitlements).__name__,
+        )
+        raise HTTPException(status_code=400, detail="Invalid identity data")
+
     service_entitlement = entitlements.get(service, {})
+    if not isinstance(service_entitlement, dict):
+        logger.warning(
+            "Identity validation failed: entitlement '%s' is %s, expected dict",
+            service,
+            type(service_entitlement).__name__,
+        )
+        raise HTTPException(status_code=400, detail="Invalid identity data")
+
     return service_entitlement.get("is_entitled", False)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/authentication/rh_identity.py` around lines 432 - 445, The code calls
RHIdentityData(...).validate_entitlements() but malformed entitlements (e.g.
entitlements=[], or entries that are not mappings) can raise
AttributeError/TypeError and escape the HTTPException handler, causing 500s and
missing metrics; wrap or augment validation so type/shape errors are converted
into an HTTPException before they escape. Specifically, in the
RHIdentityData.validate_entitlements() call (or just before it) detect
non-mapping entitlements and non-mapping service entries and raise an
HTTPException (use the same 403 entitlement_missing semantics used in the except
block) so _record_rh_identity_auth("failure", "entitlement_missing", start_time)
runs and the raised HTTPException is propagated instead of an unhandled
AttributeError/TypeError.

Comment on lines +60 to +71
try:
recording.record_auth_attempt(auth_module, result, reason)
recording.record_auth_duration(
auth_module, result, time.monotonic() - start_time
)
except Exception: # pylint: disable=broad-exception-caught
logger.warning(
"Failed to record authentication metrics for module %s with result %s",
auth_module,
result,
exc_info=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Isolate attempt and duration recording failures.

If record_auth_attempt() raises unexpectedly here, record_auth_duration() is skipped for the same request because both calls share one try block. That leaves you with neither series on the exact fallback path this helper is supposed to protect. Split these into independent guarded calls so one recorder failure does not suppress the other.

Suggested change
-    try:
-        recording.record_auth_attempt(auth_module, result, reason)
-        recording.record_auth_duration(
-            auth_module, result, time.monotonic() - start_time
-        )
-    except Exception:  # pylint: disable=broad-exception-caught
-        logger.warning(
-            "Failed to record authentication metrics for module %s with result %s",
-            auth_module,
-            result,
-            exc_info=True,
-        )
+    duration = time.monotonic() - start_time
+
+    try:
+        recording.record_auth_attempt(auth_module, result, reason)
+    except Exception:  # pylint: disable=broad-exception-caught
+        logger.warning(
+            "Failed to record authentication attempt metric for module %s with result %s",
+            auth_module,
+            result,
+            exc_info=True,
+        )
+
+    try:
+        recording.record_auth_duration(auth_module, result, duration)
+    except Exception:  # pylint: disable=broad-exception-caught
+        logger.warning(
+            "Failed to record authentication duration metric for module %s with result %s",
+            auth_module,
+            result,
+            exc_info=True,
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/authentication/utils.py` around lines 60 - 71, The current try block
wraps both recording.record_auth_attempt(...) and
recording.record_auth_duration(...), so a failure in record_auth_attempt
prevents record_auth_duration from running; change this by calling
recording.record_auth_attempt(auth_module, result, reason) inside its own
try/except that logs via logger.warning (preserve exc_info=True and the same
message format), then separately call
recording.record_auth_duration(auth_module, result, time.monotonic() -
start_time) inside a second try/except with the same logging behavior; keep the
same parameter names (auth_module, result, reason, start_time) and message text
for consistency.

major and others added 3 commits May 8, 2026 07:18
Move telemetry functions from responses.py into a dedicated
responses_telemetry.py module. Centralize fire-and-forget dispatch
logic in observability/splunk.py as dispatch_splunk_event().

No behavioral changes to request handling. Reduces responses.py
from 1201 to 1057 lines.

Signed-off-by: Major Hayden <major@redhat.com>
…t restores llama-stack (lightspeed-core#1628)

* Add diagnostic pod logs on e2e failure and remove disrupt-once optimization

* Increase vLLM max-model-len to 35936 (GPU memory limit)

* Accept 503 as valid port-forward proof in e2e connectivity check

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Major Hayden <major@redhat.com>
@major major force-pushed the split/auth-metrics branch from 7916763 to 0685826 Compare May 8, 2026 12:19
@major
Copy link
Copy Markdown
Contributor Author

major commented May 8, 2026

Superseded by #1640 which now contains all three metrics commits as a stack.

@major major closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants