RSPEED-2943: add auth monitoring metrics by major · Pull Request #1638 · lightspeed-core/lightspeed-stack

major · 2026-04-29T23:21:13Z

Stack: 1/3 - Merge this first, then #1639, then #1640.

Description

Adds bounded Prometheus metrics for authentication attempts and latency across the configured auth dependencies. This keeps auth SLO dashboards consistent across noop, token, JWK, Kubernetes, API key, and rh-identity modes without adding user identifiers as labels.

Type of change

Tools used to create PR

Assisted-by: OpenCode
Generated by: N/A

Related Tickets & Documents

Related Issue # RSPEED-2943
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

`uv run pytest tests/unit/metrics/test_recording.py tests/unit/authentication/test_utils.py tests/unit/authentication/test_noop.py tests/unit/authentication/test_noop_with_token.py tests/unit/authentication/test_api_key_token.py tests/unit/authentication/test_jwk_token.py tests/unit/authentication/test_k8s.py`
`uv run radon cc -s src/authentication/api_key_token.py src/authentication/jwk_token.py src/authentication/k8s.py src/authentication/noop.py src/authentication/noop_with_token.py src/authentication/rh_identity.py src/authentication/utils.py src/metrics/recording.py`
`uv run make verify`

Summary by CodeRabbit

New Features
- Added per-request authentication metrics and timing (attempt/result/reason + duration) with shared metric buckets; auth flows now emit bounded outcome/reason labels.
- No-op-with-token path now returns an additional skip-userid flag and records metrics for empty user IDs and token extraction categories.
- RH Identity and Kubernetes auth now normalize probe/metrics skip behavior and report structured auth/authorization outcomes and errors.
Tests
- Expanded unit tests for auth flows and metrics: recording, label normalization, error resilience, and new failure-path assertions.

coderabbitai · 2026-04-29T23:21:19Z

Warning

Rate limit exceeded

@major has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 49 minutes and 29 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 05a34c6e-4a18-43b8-8124-faa9bc4e7c52

📥 Commits

Reviewing files that changed from the base of the PR and between 7916763 and 0685826.

📒 Files selected for processing (24)

src/app/endpoints/responses.py
src/app/endpoints/responses_telemetry.py
src/authentication/api_key_token.py
src/authentication/jwk_token.py
src/authentication/k8s.py
src/authentication/noop.py
src/authentication/noop_with_token.py
src/authentication/rh_identity.py
src/authentication/utils.py
src/metrics/__init__.py
src/metrics/recording.py
src/observability/__init__.py
src/observability/splunk.py
tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml
tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml
tests/e2e-prow/rhoai/scripts/e2e-ops.sh
tests/e2e/features/environment.py
tests/unit/app/endpoints/test_responses_splunk.py
tests/unit/authentication/test_jwk_token.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_rh_identity.py
tests/unit/authentication/test_utils.py
tests/unit/metrics/test_recording.py
tests/unit/observability/test_splunk.py

Walkthrough

Adds per-request monotonic timing and centralized authentication metrics across auth dependencies, new Prometheus auth metrics and safe recording utilities, refactors JWK/K8S auth flows into helpers, and updates tests to assert bounded metric labels and durable error handling.

Changes

Auth metrics & instrumentation

Cohort / File(s)	Summary
Metrics Core `src/metrics/__init__.py`	Adds `AUTH_DURATION_BUCKETS`, `auth_attempts_total` counter and `auth_duration_seconds` histogram; updates llm histogram typing.
Metrics Recording `src/metrics/recording.py`	Adds bounded auth label constants, normalization helpers, and safe recording: `record_auth_attempt`, `record_auth_duration`.
Auth Utilities `src/authentication/utils.py`	Adds `record_auth_metrics(auth_module, result, reason, start_time)` that delegates to recording helpers and logs warnings on failures.
JWK Auth `src/authentication/jwk_token.py`	Refactors JWK auth into helpers (`_get_jwk_set_for_auth`, `_decode_jwk_claims`, `_validate_jwk_claims`, `_get_required_claim`), instruments per-request timing, and records detailed metric labels for failure/success paths.
API Key Auth `src/authentication/api_key_token.py`	Instruments API-key auth with monotonic timing and records metrics for missing header/token, invalid key, and success.
Kubernetes Auth `src/authentication/k8s.py`	Adds timing and `record_auth_metrics` across K8s flow; refactors UID resolution and SubjectAccessReview creation into helpers that centralize error classification and ensure metrics are recorded before raising HTTP errors; `get_user_info` now raises on unexpected TokenReview errors.
RH Identity Auth `src/authentication/rh_identity.py`	Adds timing and canonicalized probe/metrics skip helper; records detailed failure reasons (missing/oversized/invalid base64/JSON/non-object), classifies entitlement vs identity errors, and records success on authentication.
No-op Auth `src/authentication/noop.py`	Adds timing and metrics; records `failure/empty_user_id` on missing user_id and `skipped/no_auth_required` on success.
No-op-with-token Auth `src/authentication/noop_with_token.py`	Return value updated to a 4-tuple `(user_id, DEFAULT_USER_NAME, skip_userid_check, user_token)`; adds timing and metrics for `missing_token`, `empty_user_id`, and `no_auth_required`.
Auth Tests `tests/unit/authentication/*`	Adds/updates tests asserting metric recording on success and failure paths, verifies `record_auth_metrics` swallows metric errors and logs warnings, and expands RH Identity probe/metrics path parametrizations and nested-payload validation coverage.
Metrics Tests `tests/unit/metrics/test_recording.py`	Adds tests for `record_auth_attempt` and `record_auth_duration` covering label normalization, successful updates, and exception-safe logging behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Auth as AuthDependency
    participant Ext as External (JWK/K8S/RH)
    participant Metrics as Metrics Recorder
    Client->>Auth: Request with token/header
    Auth->>Auth: start_time = time.monotonic()
    Auth->>Ext: Validate token / Fetch JWK or TokenReview or parse identity
    Ext-->>Auth: Validation result / JWK / Error
    alt validation success
        Auth->>Metrics: record_auth_metrics(module, "success", "authenticated", start_time)
        Auth-->>Client: Authenticated (user_id, ...)
    else validation failure
        Auth->>Metrics: record_auth_metrics(module, "failure", "<reason>", start_time)
        Auth-->>Client: HTTPException (mapped status & cause)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: adding authentication monitoring metrics across multiple authentication modules.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unit/authentication/test_utils.py (1)

47-75: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add a failure-path test for record_auth_metrics exception swallowing.

Line 47 and Line 62 cover happy paths only; please also assert that an exception from one metric backend call is swallowed and logged, so auth flow stays unaffected.

Suggested test addition

+def test_record_auth_metrics_swallows_metric_errors(mocker: MockerFixture) -> None:
+    """Test helper does not propagate metric backend failures."""
+    mocker.patch(
+        "authentication.utils.recording.record_auth_attempt",
+        side_effect=ValueError("boom"),
+    )
+    mock_logger = mocker.patch("authentication.utils.logger")
+
+    record_auth_metrics("jwk-token", "failure", "missing_token", 10.0)
+
+    mock_logger.warning.assert_called_once_with(
+        "Failed to record authentication metrics for module %s with result %s",
+        "jwk-token",
+        "failure",
+        exc_info=True,
+    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/authentication/test_utils.py` around lines 47 - 75, Add a test
that verifies record_auth_metrics swallows exceptions from metric backends by
patching authentication.utils.recording.record_auth_attempt (or
record_auth_duration) to raise an exception and patching the module logger to
assert logger.exception is called; call record_auth_metrics("jwk-token",
"success", "authenticated", <start_time>) and assert the other metric function
(record_auth_duration or record_auth_attempt) was still invoked with the
expected args and that no exception propagates out of record_auth_metrics.
Reference the helper function record_auth_metrics and the backend functions
record_auth_attempt and record_auth_duration, and assert logger.exception (or
logger.error) was invoked to confirm the error was logged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/authentication/api_key_token.py`:
- Around line 67-73: The current except block for extract_user_token collapses
all HTTPException cases into a single "missing_token" reason; change it to catch
HTTPException as e and derive a specific metric reason from e (for example
inspect e.status_code or e.detail to map to "missing_token", "malformed_header",
"invalid_scheme", etc.), then call record_auth_metrics(AUTH_MOD_APIKEY_TOKEN,
"failure", derived_reason, start_time) before re-raising the exception so
telemetry reflects the actual extraction failure; reference extract_user_token,
record_auth_metrics and AUTH_MOD_APIKEY_TOKEN when making this change.

In `@src/authentication/k8s.py`:
- Around line 559-574: The code treats any None return from get_user_info() as
an invalid/expired token and emits a 401, but get_user_info() can return None on
internal/token-review failures; change the handling so a None result is
classified as a token-review/internal error instead of an "invalid_token": call
record_auth_metrics(AUTH_MOD_K8S, "failure", "token_review_error", start_time)
and return or raise an appropriate server error (e.g., 500/502) instead of
constructing UnauthorizedResponse; keep extract_user_token, get_user_info,
record_auth_metrics and UnauthorizedResponse references to locate and update the
branch that currently maps user_info is None to "invalid_token".

In `@src/authentication/rh_identity.py`:
- Around line 381-394: The code assumes decoded JSON is a dict and passes it
straight into RHIdentityData, which can raise TypeError for non-object payloads
(null, number, list) before the HTTPException handler runs; before constructing
RHIdentityData in this block, explicitly validate that identity_data is a
mapping (use isinstance(identity_data, dict)) and that nested keys like identity
(and user if expected) are dicts; if validation fails, call
_record_rh_identity_auth("failure", "invalid_identity", start_time) and raise an
HTTPException with a non-403 status (e.g., HTTPException(status_code=400)) so
the existing except HTTPException branch handles it and records the metric, then
proceed to construct RHIdentityData and call validate_entitlements as before.

In `@src/metrics/__init__.py`:
- Around line 77-89: Annotate the module-level metric constants with Final per
guidelines: update auth_attempts_total and auth_duration_seconds to have Final
type hints (e.g., Final[Counter] and Final[Histogram]) and ensure typing.Final
is imported (from typing import Final) so these constants are clearly declared
immutable while keeping their current assignments and label/bucket arguments
intact.

In `@src/metrics/recording.py`:
- Around line 31-58: ALLOWED_AUTH_REASONS currently omits the RH Identity
failure labels so the per-reason metrics collapse to "unknown"; update the
ALLOWED_AUTH_REASONS frozenset in src/metrics/recording.py to include the four
new keys recorded by src/authentication/rh_identity.py: "header_too_large",
"invalid_base64", "invalid_identity", and "entitlement_missing" (add them
alongside the existing entries in the ALLOWED_AUTH_REASONS declaration so the
labels from rh_identity.py are preserved in metrics).

---

Outside diff comments:
In `@tests/unit/authentication/test_utils.py`:
- Around line 47-75: Add a test that verifies record_auth_metrics swallows
exceptions from metric backends by patching
authentication.utils.recording.record_auth_attempt (or record_auth_duration) to
raise an exception and patching the module logger to assert logger.exception is
called; call record_auth_metrics("jwk-token", "success", "authenticated",
<start_time>) and assert the other metric function (record_auth_duration or
record_auth_attempt) was still invoked with the expected args and that no
exception propagates out of record_auth_metrics. Reference the helper function
record_auth_metrics and the backend functions record_auth_attempt and
record_auth_duration, and assert logger.exception (or logger.error) was invoked
to confirm the error was logged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ebf76500-ba75-4275-857b-3e54468c5606

📥 Commits

Reviewing files that changed from the base of the PR and between ca125c4 and 046a1a5.

📒 Files selected for processing (14)

src/authentication/api_key_token.py
src/authentication/jwk_token.py
src/authentication/k8s.py
src/authentication/noop.py
src/authentication/noop_with_token.py
src/authentication/rh_identity.py
src/authentication/utils.py
src/metrics/__init__.py
src/metrics/recording.py
tests/unit/authentication/test_jwk_token.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_rh_identity.py
tests/unit/authentication/test_utils.py
tests/unit/metrics/test_recording.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: build-pr
GitHub Check: unit_tests (3.12)
GitHub Check: Pylinter
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 3

🧰 Additional context used

📓 Path-based instructions (4)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

src/authentication/utils.py
src/authentication/api_key_token.py
src/authentication/noop.py
src/metrics/recording.py
src/authentication/noop_with_token.py
src/metrics/__init__.py
src/authentication/k8s.py
src/authentication/rh_identity.py
src/authentication/jwk_token.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

tests/unit/authentication/test_utils.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_jwk_token.py
tests/unit/metrics/test_recording.py
tests/unit/authentication/test_rh_identity.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

tests/unit/authentication/test_utils.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_jwk_token.py
tests/unit/metrics/test_recording.py
tests/unit/authentication/test_rh_identity.py

src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files contain brief package descriptions

Files:

src/metrics/__init__.py

🧠 Learnings (5)

📚 Learning: 2026-04-29T15:45:22.854Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests

Applied to files:

tests/unit/authentication/test_utils.py

📚 Learning: 2025-09-02T11:09:40.404Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

src/authentication/noop.py
src/authentication/noop_with_token.py
tests/unit/authentication/test_rh_identity.py

📚 Learning: 2026-04-13T13:34:51.052Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:377-381
Timestamp: 2026-04-13T13:34:51.052Z
Learning: In `tests/e2e/features/environment.py` (lightspeed-core/lightspeed-stack), the hardcoded token `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva` is intentionally a dummy/truncated JWT (the canonical jwt.io documentation example, missing its signature segment). It is not a real credential and does not need to be replaced or loaded from an environment variable. Only a scanner-suppression comment may be warranted if CI noise becomes an issue.

Applied to files:

tests/unit/authentication/test_jwk_token.py

📚 Learning: 2026-04-27T23:16:37.085Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1606
File: src/authentication/k8s.py:444-447
Timestamp: 2026-04-27T23:16:37.085Z
Learning: In `src/authentication/k8s.py`, the metrics skip check (`skip_for_metrics`) intentionally uses `request.url.path in ("/metrics",)` (exact match via `in`) to stay consistent with the adjacent health probe skip pattern `request.url.path in ("/readiness", "/liveness")`. This differs from `rh_identity.py` which uses `endswith("/metrics")`, but the intra-file consistency was a deliberate style choice by the author.

Applied to files:

tests/unit/authentication/test_rh_identity.py

📚 Learning: 2026-04-20T15:09:48.726Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

tests/unit/authentication/test_rh_identity.py
src/authentication/rh_identity.py

🔇 Additional comments (5)

tests/unit/authentication/test_jwk_token.py (1)

507-527: Nice coverage for error telemetry and exception-chain behavior.

These additions validate both the unexpected-error metrics path and preserved root-cause chaining for invalid required claims.

Also applies to: 594-610

src/authentication/utils.py (1)

44-66: Shared auth metric helper is implemented safely.

Good call on isolating metric write failures from request handling.

tests/unit/authentication/test_k8s.py (1)

1141-1201: Good parametrized coverage for API error mapping + metric emission.

This closes an important observability/error-handling gap for _create_subject_access_review.

tests/unit/metrics/test_recording.py (1)

165-247: Auth metric recording test coverage is solid.

Nice balance across success, normalization, and logger-backed failure paths.

src/authentication/noop.py (1)

54-67: No-op auth instrumentation is clean and behavior-preserving.

The new failure/skipped metric emission paths are correctly placed.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/authentication/rh_identity.py (1)

380-406: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The new identity guard still misses nested non-dict payloads.

Line 381 only validates the top-level JSON object. Payloads like {"identity": 1} or {"identity": {"type": "User", "user": 1}} still raise TypeError inside RHIdentityData, bypass this except HTTPException, and return a 500 without the invalid_identity metric.

Proposed fix

-        if not isinstance(identity_data, dict):
+        if (
+            not isinstance(identity_data, dict)
+            or not isinstance(identity_data.get("identity"), dict)
+            or (
+                identity_data["identity"].get("type") == "User"
+                and not isinstance(identity_data["identity"].get("user"), dict)
+            )
+            or (
+                identity_data["identity"].get("type") == "System"
+                and not isinstance(identity_data["identity"].get("system"), dict)
+            )
+        ):
             logger.warning(
-                "x-rh-identity decoded to non-dict type: %s",
+                "x-rh-identity decoded to invalid structure: %s",
                 type(identity_data).__name__,
             )
             _record_rh_identity_auth("failure", "invalid_identity", start_time)
             raise HTTPException(
                 status_code=400,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/authentication/rh_identity.py` around lines 380 - 406, The top-level
guard only checks identity_data is a dict but misses nested non-dict shapes that
cause TypeError inside RHIdentityData and surface as 500s; before constructing
RHIdentityData (or immediately after), validate that
identity_data.get("identity") is a dict and that nested expected fields (e.g.,
"type", "user", "account") are the correct types, and if not call
_record_rh_identity_auth("failure","invalid_identity", start_time) and raise
HTTPException(400, detail="Invalid identity data in x-rh-identity header");
alternatively, wrap the RHIdentityData(...) and
rh_identity.validate_entitlements() calls in a broader except Exception as e
that converts TypeError/ValueError into an HTTPException with the above metric
recording so nested shape errors do not produce 500s.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/authentication/k8s.py`:
- Around line 431-493: The call to K8sClientSingleton.get_authz_api() can raise
during client init and is currently caught by the broad Exception handler
(mislabeling it as an unexpected error); wrap the get_authz_api() call in its
own try/except that catches ApiException (and generic client-init exceptions if
needed) and handle them like Kubernetes API outages: call
record_auth_metrics(AUTH_MOD_K8S, "failure", "authorization_check_error",
start_time), log the error, construct a ServiceUnavailableResponse
(backend_name="Kubernetes API", cause including e.reason or str(e)) and raise
HTTPException(**response.model_dump()) so client-init failures are classified as
service unavailable rather than unexpected_error, leaving the subsequent
SubjectAccessReview creation/error handling unchanged.

---

Duplicate comments:
In `@src/authentication/rh_identity.py`:
- Around line 380-406: The top-level guard only checks identity_data is a dict
but misses nested non-dict shapes that cause TypeError inside RHIdentityData and
surface as 500s; before constructing RHIdentityData (or immediately after),
validate that identity_data.get("identity") is a dict and that nested expected
fields (e.g., "type", "user", "account") are the correct types, and if not call
_record_rh_identity_auth("failure","invalid_identity", start_time) and raise
HTTPException(400, detail="Invalid identity data in x-rh-identity header");
alternatively, wrap the RHIdentityData(...) and
rh_identity.validate_entitlements() calls in a broader except Exception as e
that converts TypeError/ValueError into an HTTPException with the above metric
recording so nested shape errors do not produce 500s.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c1eab94e-5401-4574-a556-c43d5bb15b54

📥 Commits

Reviewing files that changed from the base of the PR and between 046a1a5 and 0388b99.

📒 Files selected for processing (14)

src/authentication/api_key_token.py
src/authentication/jwk_token.py
src/authentication/k8s.py
src/authentication/noop.py
src/authentication/noop_with_token.py
src/authentication/rh_identity.py
src/authentication/utils.py
src/metrics/__init__.py
src/metrics/recording.py
tests/unit/authentication/test_jwk_token.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_rh_identity.py
tests/unit/authentication/test_utils.py
tests/unit/metrics/test_recording.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: spectral
GitHub Check: Pylinter
GitHub Check: build-pr
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

📓 Path-based instructions (4)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

src/authentication/utils.py
src/authentication/noop.py
src/authentication/api_key_token.py
src/metrics/__init__.py
src/authentication/noop_with_token.py
src/authentication/jwk_token.py
src/authentication/k8s.py
src/authentication/rh_identity.py
src/metrics/recording.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

tests/unit/authentication/test_utils.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_jwk_token.py
tests/unit/metrics/test_recording.py
tests/unit/authentication/test_rh_identity.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

tests/unit/authentication/test_utils.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_jwk_token.py
tests/unit/metrics/test_recording.py
tests/unit/authentication/test_rh_identity.py

src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files contain brief package descriptions

Files:

src/metrics/__init__.py

🧠 Learnings (7)

📚 Learning: 2026-04-13T13:34:51.052Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:377-381
Timestamp: 2026-04-13T13:34:51.052Z
Learning: In `tests/e2e/features/environment.py` (lightspeed-core/lightspeed-stack), the hardcoded token `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva` is intentionally a dummy/truncated JWT (the canonical jwt.io documentation example, missing its signature segment). It is not a real credential and does not need to be replaced or loaded from an environment variable. Only a scanner-suppression comment may be warranted if CI noise becomes an issue.

Applied to files:

tests/unit/authentication/test_jwk_token.py

📚 Learning: 2025-09-02T11:09:40.404Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

src/authentication/noop.py
tests/unit/authentication/test_rh_identity.py

📚 Learning: 2026-04-29T15:45:22.854Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/**/*.py : Use Final[type] as type hint for all constants

Applied to files:

src/metrics/__init__.py

📚 Learning: 2026-04-29T15:45:22.854Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/constants.py : Central `constants.py` for shared constants with descriptive comments

Applied to files:

src/metrics/__init__.py

📚 Learning: 2026-04-29T15:45:22.854Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Check `constants.py` for shared constants before defining new ones

Applied to files:

src/metrics/__init__.py

📚 Learning: 2026-04-20T15:09:48.726Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

src/authentication/rh_identity.py
tests/unit/authentication/test_rh_identity.py
src/metrics/recording.py

📚 Learning: 2026-04-27T23:16:37.085Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1606
File: src/authentication/k8s.py:444-447
Timestamp: 2026-04-27T23:16:37.085Z
Learning: In `src/authentication/k8s.py`, the metrics skip check (`skip_for_metrics`) intentionally uses `request.url.path in ("/metrics",)` (exact match via `in`) to stay consistent with the adjacent health probe skip pattern `request.url.path in ("/readiness", "/liveness")`. This differs from `rh_identity.py` which uses `endswith("/metrics")`, but the intra-file consistency was a deliberate style choice by the author.

Applied to files:

tests/unit/authentication/test_rh_identity.py

🔇 Additional comments (10)

tests/unit/authentication/test_jwk_token.py (2)

507-527: Strong coverage for unexpected-error metrics in JWK auth path.

This test cleanly verifies the failure metric payload and preserves the original exception flow.

594-610: Exception-chain assertion is a valuable regression guard.

Verifying __cause__ here protects root-cause visibility for invalid-claim failures.

src/authentication/utils.py (1)

44-66: record_auth_metrics implementation is robust and aligned with non-disruptive telemetry goals.

The helper correctly keeps auth execution safe even if metric recording fails.

tests/unit/authentication/test_utils.py (1)

47-92: Good focused unit coverage for the shared auth-metrics helper.

The added cases validate both duration math and graceful degradation on recorder failures.

src/authentication/noop.py (1)

54-67: No-op auth metrics instrumentation is well-placed and behavior-preserving.

Failure and skipped outcomes are both captured with consistent timing context.

tests/unit/metrics/test_recording.py (1)

221-296: Nice expansion of bounded-label and failure-path coverage for auth metric recorders.

These tests provide solid guardrails for normalization and logging behavior.

tests/unit/authentication/test_k8s.py (1)

1141-1201: Good direct coverage of SubjectAccessReview error mapping plus metric emission.

The new parametrized assertions strengthen confidence in K8s failure classification.

src/authentication/api_key_token.py (2)

67-77: Great fix for differentiating token extraction failure reasons in metrics.

This improves auth telemetry fidelity without altering exception behavior.

83-92: Success/failure metric hooks around API-key validation look correct.

The placement ensures invalid-key and valid-key outcomes are both consistently timed and recorded.

tests/unit/authentication/test_rh_identity.py (1)

507-640: Path-parametrization updates are well aligned with normalized skip-path logic.

Including trailing-slash and non-skip variants improves regression safety for probe/metrics bypass behavior.

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

src/authentication/rh_identity.py (1)

414-427: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard nested container types before constructing RHIdentityData.

The new top-level dict check still lets payloads like {"identity": 1} or {"identity": {"type": "User", "user": []}} raise TypeError inside RHIdentityData. Those bypass this except HTTPException branch, return 500, and skip the invalid_identity metric.

Suggested fix

         try:
+            identity = identity_data.get("identity")
+            if not isinstance(identity, dict):
+                raise HTTPException(
+                    status_code=400,
+                    detail="Invalid identity data in x-rh-identity header",
+                )
+            if identity.get("type") == "User" and not isinstance(
+                identity.get("user"), dict
+            ):
+                raise HTTPException(
+                    status_code=400,
+                    detail="Invalid identity data in x-rh-identity header",
+                )
+            if identity.get("type") == "System" and not isinstance(
+                identity.get("system"), dict
+            ):
+                raise HTTPException(
+                    status_code=400,
+                    detail="Invalid identity data in x-rh-identity header",
+                )
             rh_identity = RHIdentityData(
                 identity_data,
                 required_entitlements=self.required_entitlements,
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/authentication/rh_identity.py` around lines 414 - 427, Before
constructing RHIdentityData, explicitly validate that identity_data is a dict
and that identity_data.get("identity") is also a dict (and any nested container
fields RHIdentityData expects, e.g., "user" or "org", are dicts when present);
if these checks fail, call _record_rh_identity_auth("failure",
"invalid_identity", start_time) and raise an HTTPException (non-403 status) so
the existing except HTTPException branch runs; update the block around
RHIdentityData(...) and rh_identity.validate_entitlements() to perform these
isinstance checks and short-circuit with the metric + HTTPException on
type-mismatch to avoid internal TypeError and 500s.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/authentication/jwk_token.py`:
- Around line 216-223: The validator currently only rejects empty strings but
accepts whitespace-only claims; update the check in the function handling claim
validation (the block that calls record_auth_metrics/AUTH_MOD_JWK_TOKEN and
raises UnauthorizedResponse/HTTPException) to treat strings that are all
whitespace as invalid by testing e.g. value.strip() before accepting it, so that
whitespace-only values trigger the same invalid_claim metric and raise the same
UnauthorizedResponse/HTTPException as empty strings.

In `@src/authentication/noop_with_token.py`:
- Around line 72-78: The catch block for extract_user_token(request.headers)
should preserve the exception's detail instead of always using "missing_token":
catch the HTTPException as e in the try/except around extract_user_token, derive
a failure_reason from e.detail (or str(e) if detail missing) and pass that into
record_auth_metrics(AUTH_MOD_NOOP_WITH_TOKEN, "failure", failure_reason,
start_time), then re-raise the original exception so the header-vs-malformed
distinction is retained; reference extract_user_token, record_auth_metrics,
AUTH_MOD_NOOP_WITH_TOKEN, request.headers, and start_time when making the
change.

In `@src/authentication/utils.py`:
- Around line 44-66: Update the public function record_auth_metrics docstring to
follow Google Python style: replace the "Args:" section with a "Parameters"
section listing auth_module (str), result (str), reason (str), and start_time
(float) with concise descriptions, add a "Returns" section documenting that it
returns None, and optionally include a "Raises" note only if the function may
raise exceptions; keep the existing overall one-sentence summary and the
parameter descriptions aligned with the rest of the repo's docstrings for
consistency.

---

Duplicate comments:
In `@src/authentication/rh_identity.py`:
- Around line 414-427: Before constructing RHIdentityData, explicitly validate
that identity_data is a dict and that identity_data.get("identity") is also a
dict (and any nested container fields RHIdentityData expects, e.g., "user" or
"org", are dicts when present); if these checks fail, call
_record_rh_identity_auth("failure", "invalid_identity", start_time) and raise an
HTTPException (non-403 status) so the existing except HTTPException branch runs;
update the block around RHIdentityData(...) and
rh_identity.validate_entitlements() to perform these isinstance checks and
short-circuit with the metric + HTTPException on type-mismatch to avoid internal
TypeError and 500s.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6f170dd6-9029-4db1-8967-bf72610f16a2

📥 Commits

Reviewing files that changed from the base of the PR and between 0388b99 and 2e84669.

📒 Files selected for processing (14)

src/authentication/api_key_token.py
src/authentication/jwk_token.py
src/authentication/k8s.py
src/authentication/noop.py
src/authentication/noop_with_token.py
src/authentication/rh_identity.py
src/authentication/utils.py
src/metrics/__init__.py
src/metrics/recording.py
tests/unit/authentication/test_jwk_token.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_rh_identity.py
tests/unit/authentication/test_utils.py
tests/unit/metrics/test_recording.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: build-pr
GitHub Check: unit_tests (3.13)
GitHub Check: unit_tests (3.12)
GitHub Check: Pylinter
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

📓 Path-based instructions (4)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

src/authentication/utils.py
src/authentication/api_key_token.py
src/authentication/noop.py
src/authentication/noop_with_token.py
src/metrics/__init__.py
src/authentication/jwk_token.py
src/metrics/recording.py
src/authentication/rh_identity.py
src/authentication/k8s.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

tests/unit/authentication/test_jwk_token.py
tests/unit/metrics/test_recording.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_utils.py
tests/unit/authentication/test_rh_identity.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

tests/unit/authentication/test_jwk_token.py
tests/unit/metrics/test_recording.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_utils.py
tests/unit/authentication/test_rh_identity.py

src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files contain brief package descriptions

Files:

src/metrics/__init__.py

🧠 Learnings (6)

📚 Learning: 2026-04-13T13:34:51.052Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:377-381
Timestamp: 2026-04-13T13:34:51.052Z
Learning: In `tests/e2e/features/environment.py` (lightspeed-core/lightspeed-stack), the hardcoded token `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva` is intentionally a dummy/truncated JWT (the canonical jwt.io documentation example, missing its signature segment). It is not a real credential and does not need to be replaced or loaded from an environment variable. Only a scanner-suppression comment may be warranted if CI noise becomes an issue.

Applied to files:

tests/unit/authentication/test_jwk_token.py

📚 Learning: 2026-04-29T15:45:22.854Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/**/*.py : Use Final[type] as type hint for all constants

Applied to files:

src/metrics/__init__.py

📚 Learning: 2026-04-20T15:09:48.726Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

src/metrics/recording.py
src/authentication/rh_identity.py
tests/unit/authentication/test_rh_identity.py

📚 Learning: 2026-04-27T23:16:37.085Z

Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1606
File: src/authentication/k8s.py:444-447
Timestamp: 2026-04-27T23:16:37.085Z
Learning: In `src/authentication/k8s.py`, the metrics skip check (`skip_for_metrics`) intentionally uses `request.url.path in ("/metrics",)` (exact match via `in`) to stay consistent with the adjacent health probe skip pattern `request.url.path in ("/readiness", "/liveness")`. This differs from `rh_identity.py` which uses `endswith("/metrics")`, but the intra-file consistency was a deliberate style choice by the author.

Applied to files:

tests/unit/authentication/test_rh_identity.py

📚 Learning: 2025-09-02T11:09:40.404Z

Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

tests/unit/authentication/test_rh_identity.py

📚 Learning: 2026-04-29T15:45:22.854Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling

Applied to files:

src/authentication/k8s.py

🔇 Additional comments (11)

tests/unit/authentication/test_jwk_token.py (1)

16-16: Good addition: the new auth-module constant and tests line up with the shared metrics contract.

The import keeps the assertions on the canonical module label, and the new cases cover both the unexpected extraction failure metric and the invalid-user_id exception chain.

Also applies to: 507-610

src/authentication/noop.py (1)

3-67: Looks good.

The monotonic timer and record_auth_metrics calls preserve the existing 400/return behavior while adding the requested observability.

tests/unit/authentication/test_utils.py (1)

6-91: Solid coverage for the shared metrics helper.

These tests validate the success/failure paths and the swallowed-error behavior without overfitting to the internal metric implementation.

src/authentication/api_key_token.py (1)

64-91: Nice split of extraction vs validation failures.

The new metrics calls preserve the existing HTTPExceptions while giving the auth telemetry enough detail to separate missing headers, malformed tokens, and invalid API keys.

tests/unit/metrics/test_recording.py (1)

221-296: Good coverage for the new auth metric helpers.

The added cases exercise both bounded-label normalization and the swallowed-error behavior that the auth wrappers depend on.

tests/unit/authentication/test_k8s.py (1)

11-24: Good addition of direct SubjectAccessReview coverage.

The new import and tests pin the authz error mapping and the init-failure metric recording without changing the production flow.

Also applies to: 1141-1223

src/metrics/__init__.py (1)

98-111: Nice bounded metric surface.

The new auth counter and histogram keep the label set small and line up with the recording helpers’ (auth_module, result[, reason]) shape, which should keep the auth dashboards stable across backends.

tests/unit/authentication/test_rh_identity.py (1)

507-645: Good skip-path regression coverage.

Covering both canonical and trailing-slash probe/metrics routes, plus nearby non-skip routes, matches the normalized-path behavior and protects against false positives.

src/authentication/k8s.py (1)

557-610: Nice separation of K8S auth outcomes.

The updated flow keeps skip, token-review, kube:admin, SAR, authorization, and success paths recording one bounded reason each, which makes the new auth metrics much easier to trust.

src/authentication/rh_identity.py (1)

47-63: Nice path normalization helper.

Canonicalizing the request path before exact comparisons lets /readiness/ and /metrics/ share the same skip behavior without widening the match beyond those endpoints.

src/metrics/recording.py (1)

24-84: The normalization layer looks solid.

Centralizing allowed auth results and reasons here keeps call sites simple while still bounding label cardinality before values reach Prometheus.

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/authentication/jwk_token.py`:
- Around line 290-296: The catch-all except for
extract_user_token(request.headers) currently records every failure as
"missing_token"; change the handler to capture the HTTPException (e.g., except
HTTPException as exc) and derive a meaningful reason from it (use exc.detail or
str(exc) or map exc.status_code to a reason) when calling
record_auth_metrics(AUTH_MOD_JWK_TOKEN, "failure", reason, start_time), then
re-raise the original exception; update the block around extract_user_token and
the record_auth_metrics invocation so malformed-scheme and missing-bearer are
recorded distinctly instead of always "missing_token".

In `@src/authentication/k8s.py`:
- Around line 575-579: The K8s path always records extract_user_token failures
as "missing_token" losing the original reason; change the except block to
capture the exception (except HTTPException as e), extract a preserved reason
from e.detail (or e.description) if present, fall back to "missing_token"
otherwise, then call record_auth_metrics(AUTH_MOD_K8S, "failure", reason,
start_time) before re-raising; update references to extract_user_token,
record_auth_metrics and AUTH_MOD_K8S accordingly so the original
"missing_header" reason is preserved.

In `@src/authentication/rh_identity.py`:
- Around line 432-445: The code calls
RHIdentityData(...).validate_entitlements() but malformed entitlements (e.g.
entitlements=[], or entries that are not mappings) can raise
AttributeError/TypeError and escape the HTTPException handler, causing 500s and
missing metrics; wrap or augment validation so type/shape errors are converted
into an HTTPException before they escape. Specifically, in the
RHIdentityData.validate_entitlements() call (or just before it) detect
non-mapping entitlements and non-mapping service entries and raise an
HTTPException (use the same 403 entitlement_missing semantics used in the except
block) so _record_rh_identity_auth("failure", "entitlement_missing", start_time)
runs and the raised HTTPException is propagated instead of an unhandled
AttributeError/TypeError.
- Around line 375-379: The skip-path logic should run unconditionally so
readiness/metrics bypass works even when an x-rh-identity header exists or is
malformed: call _get_auth_skip_tuple(request, start_time) immediately and if it
returns a non-None tuple return it before attempting to read or validate
identity_header; update the block around
identity_header/request.headers.get("x-rh-identity") and the auth_skip handling
so _get_auth_skip_tuple is invoked first and short-circuits processing when
appropriate.

In `@src/authentication/utils.py`:
- Around line 60-71: The current try block wraps both
recording.record_auth_attempt(...) and recording.record_auth_duration(...), so a
failure in record_auth_attempt prevents record_auth_duration from running;
change this by calling recording.record_auth_attempt(auth_module, result,
reason) inside its own try/except that logs via logger.warning (preserve
exc_info=True and the same message format), then separately call
recording.record_auth_duration(auth_module, result, time.monotonic() -
start_time) inside a second try/except with the same logging behavior; keep the
same parameter names (auth_module, result, reason, start_time) and message text
for consistency.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6c591d00-e3af-4078-8cb1-1f1ee5f6f8e4

📥 Commits

Reviewing files that changed from the base of the PR and between 2e84669 and 7916763.

📒 Files selected for processing (14)

src/authentication/api_key_token.py
src/authentication/jwk_token.py
src/authentication/k8s.py
src/authentication/noop.py
src/authentication/noop_with_token.py
src/authentication/rh_identity.py
src/authentication/utils.py
src/metrics/__init__.py
src/metrics/recording.py
tests/unit/authentication/test_jwk_token.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_rh_identity.py
tests/unit/authentication/test_utils.py
tests/unit/metrics/test_recording.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 3

🧰 Additional context used

📓 Path-based instructions (3)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/authentication/api_key_token.py
src/metrics/__init__.py
src/authentication/utils.py
src/authentication/k8s.py
src/authentication/jwk_token.py
src/metrics/recording.py
src/authentication/noop.py
src/authentication/noop_with_token.py
src/authentication/rh_identity.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

tests/unit/authentication/test_jwk_token.py
tests/unit/authentication/test_rh_identity.py
tests/unit/authentication/test_utils.py
tests/unit/metrics/test_recording.py
tests/unit/authentication/test_k8s.py

src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files must contain brief package descriptions

Files:

src/metrics/__init__.py

coderabbitai · 2026-05-08T12:17:52Z

+        try:
+            token = extract_user_token(request.headers)
+        except HTTPException:
+            record_auth_metrics(AUTH_MOD_K8S, "failure", "missing_token", start_time)
+            raise


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve missing_header in the K8s extraction path too.

This branch records every extract_user_token() failure as missing_token, so requests with no Authorization header lose the same distinction you already added in api_key_token.py. That makes the new per-reason auth dashboards inconsistent across auth modes.

Suggested change

- try: - token = extract_user_token(request.headers) - except HTTPException: - record_auth_metrics(AUTH_MOD_K8S, "failure", "missing_token", start_time) - raise + try: + token = extract_user_token(request.headers) + except HTTPException as exc: + reason = "missing_token" + if isinstance(exc.detail, dict) and "No Authorization header" in exc.detail.get( + "cause", "" + ): + reason = "missing_header" + record_auth_metrics(AUTH_MOD_K8S, "failure", reason, start_time) + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

token = extract_user_token(request.headers)

except HTTPException:

record_auth_metrics(AUTH_MOD_K8S, "failure", "missing_token", start_time)

raise

try:

token = extract_user_token(request.headers)

except HTTPException as exc:

reason = "missing_token"

if isinstance(exc.detail, dict) and "No Authorization header" in exc.detail.get(

"cause", ""

):

reason = "missing_header"

record_auth_metrics(AUTH_MOD_K8S, "failure", reason, start_time)

raise

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/authentication/k8s.py` around lines 575 - 579, The K8s path always records extract_user_token failures as "missing_token" losing the original reason; change the except block to capture the exception (except HTTPException as e), extract a preserved reason from e.detail (or e.description) if present, fall back to "missing_token" otherwise, then call record_auth_metrics(AUTH_MOD_K8S, "failure", reason, start_time) before re-raising; update references to extract_user_token, record_auth_metrics and AUTH_MOD_K8S accordingly so the original "missing_header" reason is preserved.

coderabbitai · 2026-05-08T12:17:52Z

+        try:
+            rh_identity = RHIdentityData(
+                identity_data,
+                required_entitlements=self.required_entitlements,
+            )

-        # Validate entitlements if configured
-        rh_identity.validate_entitlements()
+            # Validate entitlements if configured
+            rh_identity.validate_entitlements()
+        except HTTPException as exc:
+            reason = (
+                "entitlement_missing" if exc.status_code == 403 else "invalid_identity"
+            )
+            _record_rh_identity_auth("failure", reason, start_time)
+            raise


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate malformed entitlements payloads before they escape as 500s.

validate_entitlements() still assumes entitlements and each service entry are mappings. Payloads like "entitlements": [] or "entitlements": {"rhel": 1} will raise AttributeError here, bypass this except HTTPException, and miss the new failure metric.

Suggested fix

def has_entitlement(self, service: str) -> bool: """Check if user has a specific service entitlement.""" entitlements = self.identity_data.get("entitlements", {}) + if not isinstance(entitlements, dict): + logger.warning( + "Identity validation failed: 'entitlements' is %s, expected dict", + type(entitlements).__name__, + ) + raise HTTPException(status_code=400, detail="Invalid identity data") + service_entitlement = entitlements.get(service, {}) + if not isinstance(service_entitlement, dict): + logger.warning( + "Identity validation failed: entitlement '%s' is %s, expected dict", + service, + type(service_entitlement).__name__, + ) + raise HTTPException(status_code=400, detail="Invalid identity data") + return service_entitlement.get("is_entitled", False)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/authentication/rh_identity.py` around lines 432 - 445, The code calls RHIdentityData(...).validate_entitlements() but malformed entitlements (e.g. entitlements=[], or entries that are not mappings) can raise AttributeError/TypeError and escape the HTTPException handler, causing 500s and missing metrics; wrap or augment validation so type/shape errors are converted into an HTTPException before they escape. Specifically, in the RHIdentityData.validate_entitlements() call (or just before it) detect non-mapping entitlements and non-mapping service entries and raise an HTTPException (use the same 403 entitlement_missing semantics used in the except block) so _record_rh_identity_auth("failure", "entitlement_missing", start_time) runs and the raised HTTPException is propagated instead of an unhandled AttributeError/TypeError.

coderabbitai · 2026-05-08T12:17:52Z

+    try:
+        recording.record_auth_attempt(auth_module, result, reason)
+        recording.record_auth_duration(
+            auth_module, result, time.monotonic() - start_time
+        )
+    except Exception:  # pylint: disable=broad-exception-caught
+        logger.warning(
+            "Failed to record authentication metrics for module %s with result %s",
+            auth_module,
+            result,
+            exc_info=True,
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Isolate attempt and duration recording failures.

If record_auth_attempt() raises unexpectedly here, record_auth_duration() is skipped for the same request because both calls share one try block. That leaves you with neither series on the exact fallback path this helper is supposed to protect. Split these into independent guarded calls so one recorder failure does not suppress the other.

Suggested change

- try: - recording.record_auth_attempt(auth_module, result, reason) - recording.record_auth_duration( - auth_module, result, time.monotonic() - start_time - ) - except Exception: # pylint: disable=broad-exception-caught - logger.warning( - "Failed to record authentication metrics for module %s with result %s", - auth_module, - result, - exc_info=True, - ) + duration = time.monotonic() - start_time + + try: + recording.record_auth_attempt(auth_module, result, reason) + except Exception: # pylint: disable=broad-exception-caught + logger.warning( + "Failed to record authentication attempt metric for module %s with result %s", + auth_module, + result, + exc_info=True, + ) + + try: + recording.record_auth_duration(auth_module, result, duration) + except Exception: # pylint: disable=broad-exception-caught + logger.warning( + "Failed to record authentication duration metric for module %s with result %s", + auth_module, + result, + exc_info=True, + )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/authentication/utils.py` around lines 60 - 71, The current try block wraps both recording.record_auth_attempt(...) and recording.record_auth_duration(...), so a failure in record_auth_attempt prevents record_auth_duration from running; change this by calling recording.record_auth_attempt(auth_module, result, reason) inside its own try/except that logs via logger.warning (preserve exc_info=True and the same message format), then separately call recording.record_auth_duration(auth_module, result, time.monotonic() - start_time) inside a second try/except with the same logging behavior; keep the same parameter names (auth_module, result, reason, start_time) and message text for consistency.

Move telemetry functions from responses.py into a dedicated responses_telemetry.py module. Centralize fire-and-forget dispatch logic in observability/splunk.py as dispatch_splunk_event(). No behavioral changes to request handling. Reduces responses.py from 1201 to 1057 lines. Signed-off-by: Major Hayden <major@redhat.com>

…t restores llama-stack (lightspeed-core#1628) * Add diagnostic pod logs on e2e failure and remove disrupt-once optimization * Increase vLLM max-model-len to 35936 (GPU memory limit) * Accept 503 as valid port-forward proof in e2e connectivity check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: Major Hayden <major@redhat.com>

major · 2026-05-08T16:15:41Z

Superseded by #1640 which now contains all three metrics commits as a stack.

major mentioned this pull request Apr 29, 2026

RSPEED-2943: add SLO monitoring metrics #1637

Closed

19 tasks

major force-pushed the split/auth-metrics branch from e54eae1 to 046a1a5 Compare April 30, 2026 03:53

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread src/authentication/api_key_token.py

Comment thread src/authentication/k8s.py

Comment thread src/authentication/rh_identity.py

Comment thread src/metrics/__init__.py Outdated

Comment thread src/metrics/recording.py

major force-pushed the split/auth-metrics branch from 046a1a5 to 0388b99 Compare April 30, 2026 21:03

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread src/authentication/k8s.py

major force-pushed the split/auth-metrics branch from 0388b99 to 2e84669 Compare May 1, 2026 11:36

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/authentication/jwk_token.py

Comment thread src/authentication/jwk_token.py Outdated

Comment thread src/authentication/noop_with_token.py

Comment thread src/authentication/utils.py

major force-pushed the split/auth-metrics branch 5 times, most recently from f85f0fa to 7916763 Compare May 8, 2026 12:09

This was referenced May 8, 2026

RSPEED-2943: add authorization monitoring metrics #1639

Closed

RSPEED-2943: add observability metrics for auth, authorization, and quota #1640

Open

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

major and others added 3 commits May 8, 2026 07:18

feat: add auth monitoring metrics

0685826

Signed-off-by: Major Hayden <major@redhat.com>

major force-pushed the split/auth-metrics branch from 7916763 to 0685826 Compare May 8, 2026 12:19

major closed this May 8, 2026

Conversation

major commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

major commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

major commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading