RSPEED-2943: add auth monitoring metrics#1638
RSPEED-2943: add auth monitoring metrics#1638major wants to merge 3 commits intolightspeed-core:mainfrom
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (24)
WalkthroughAdds per-request monotonic timing and centralized authentication metrics across auth dependencies, new Prometheus auth metrics and safe recording utilities, refactors JWK/K8S auth flows into helpers, and updates tests to assert bounded metric labels and durable error handling. ChangesAuth metrics & instrumentation
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Auth as AuthDependency
participant Ext as External (JWK/K8S/RH)
participant Metrics as Metrics Recorder
Client->>Auth: Request with token/header
Auth->>Auth: start_time = time.monotonic()
Auth->>Ext: Validate token / Fetch JWK or TokenReview or parse identity
Ext-->>Auth: Validation result / JWK / Error
alt validation success
Auth->>Metrics: record_auth_metrics(module, "success", "authenticated", start_time)
Auth-->>Client: Authenticated (user_id, ...)
else validation failure
Auth->>Metrics: record_auth_metrics(module, "failure", "<reason>", start_time)
Auth-->>Client: HTTPException (mapped status & cause)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e54eae1 to
046a1a5
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/authentication/test_utils.py (1)
47-75: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winAdd a failure-path test for
record_auth_metricsexception swallowing.Line 47 and Line 62 cover happy paths only; please also assert that an exception from one metric backend call is swallowed and logged, so auth flow stays unaffected.
Suggested test addition
+def test_record_auth_metrics_swallows_metric_errors(mocker: MockerFixture) -> None: + """Test helper does not propagate metric backend failures.""" + mocker.patch( + "authentication.utils.recording.record_auth_attempt", + side_effect=ValueError("boom"), + ) + mock_logger = mocker.patch("authentication.utils.logger") + + record_auth_metrics("jwk-token", "failure", "missing_token", 10.0) + + mock_logger.warning.assert_called_once_with( + "Failed to record authentication metrics for module %s with result %s", + "jwk-token", + "failure", + exc_info=True, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/authentication/test_utils.py` around lines 47 - 75, Add a test that verifies record_auth_metrics swallows exceptions from metric backends by patching authentication.utils.recording.record_auth_attempt (or record_auth_duration) to raise an exception and patching the module logger to assert logger.exception is called; call record_auth_metrics("jwk-token", "success", "authenticated", <start_time>) and assert the other metric function (record_auth_duration or record_auth_attempt) was still invoked with the expected args and that no exception propagates out of record_auth_metrics. Reference the helper function record_auth_metrics and the backend functions record_auth_attempt and record_auth_duration, and assert logger.exception (or logger.error) was invoked to confirm the error was logged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/authentication/api_key_token.py`:
- Around line 67-73: The current except block for extract_user_token collapses
all HTTPException cases into a single "missing_token" reason; change it to catch
HTTPException as e and derive a specific metric reason from e (for example
inspect e.status_code or e.detail to map to "missing_token", "malformed_header",
"invalid_scheme", etc.), then call record_auth_metrics(AUTH_MOD_APIKEY_TOKEN,
"failure", derived_reason, start_time) before re-raising the exception so
telemetry reflects the actual extraction failure; reference extract_user_token,
record_auth_metrics and AUTH_MOD_APIKEY_TOKEN when making this change.
In `@src/authentication/k8s.py`:
- Around line 559-574: The code treats any None return from get_user_info() as
an invalid/expired token and emits a 401, but get_user_info() can return None on
internal/token-review failures; change the handling so a None result is
classified as a token-review/internal error instead of an "invalid_token": call
record_auth_metrics(AUTH_MOD_K8S, "failure", "token_review_error", start_time)
and return or raise an appropriate server error (e.g., 500/502) instead of
constructing UnauthorizedResponse; keep extract_user_token, get_user_info,
record_auth_metrics and UnauthorizedResponse references to locate and update the
branch that currently maps user_info is None to "invalid_token".
In `@src/authentication/rh_identity.py`:
- Around line 381-394: The code assumes decoded JSON is a dict and passes it
straight into RHIdentityData, which can raise TypeError for non-object payloads
(null, number, list) before the HTTPException handler runs; before constructing
RHIdentityData in this block, explicitly validate that identity_data is a
mapping (use isinstance(identity_data, dict)) and that nested keys like identity
(and user if expected) are dicts; if validation fails, call
_record_rh_identity_auth("failure", "invalid_identity", start_time) and raise an
HTTPException with a non-403 status (e.g., HTTPException(status_code=400)) so
the existing except HTTPException branch handles it and records the metric, then
proceed to construct RHIdentityData and call validate_entitlements as before.
In `@src/metrics/__init__.py`:
- Around line 77-89: Annotate the module-level metric constants with Final per
guidelines: update auth_attempts_total and auth_duration_seconds to have Final
type hints (e.g., Final[Counter] and Final[Histogram]) and ensure typing.Final
is imported (from typing import Final) so these constants are clearly declared
immutable while keeping their current assignments and label/bucket arguments
intact.
In `@src/metrics/recording.py`:
- Around line 31-58: ALLOWED_AUTH_REASONS currently omits the RH Identity
failure labels so the per-reason metrics collapse to "unknown"; update the
ALLOWED_AUTH_REASONS frozenset in src/metrics/recording.py to include the four
new keys recorded by src/authentication/rh_identity.py: "header_too_large",
"invalid_base64", "invalid_identity", and "entitlement_missing" (add them
alongside the existing entries in the ALLOWED_AUTH_REASONS declaration so the
labels from rh_identity.py are preserved in metrics).
---
Outside diff comments:
In `@tests/unit/authentication/test_utils.py`:
- Around line 47-75: Add a test that verifies record_auth_metrics swallows
exceptions from metric backends by patching
authentication.utils.recording.record_auth_attempt (or record_auth_duration) to
raise an exception and patching the module logger to assert logger.exception is
called; call record_auth_metrics("jwk-token", "success", "authenticated",
<start_time>) and assert the other metric function (record_auth_duration or
record_auth_attempt) was still invoked with the expected args and that no
exception propagates out of record_auth_metrics. Reference the helper function
record_auth_metrics and the backend functions record_auth_attempt and
record_auth_duration, and assert logger.exception (or logger.error) was invoked
to confirm the error was logged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ebf76500-ba75-4275-857b-3e54468c5606
📒 Files selected for processing (14)
src/authentication/api_key_token.pysrc/authentication/jwk_token.pysrc/authentication/k8s.pysrc/authentication/noop.pysrc/authentication/noop_with_token.pysrc/authentication/rh_identity.pysrc/authentication/utils.pysrc/metrics/__init__.pysrc/metrics/recording.pytests/unit/authentication/test_jwk_token.pytests/unit/authentication/test_k8s.pytests/unit/authentication/test_rh_identity.pytests/unit/authentication/test_utils.pytests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.12)
- GitHub Check: Pylinter
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax:str | int
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes use ABC with@abstractmethoddecorators
Complete type annotations for all class attributes, use specific types, notAny
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection
Files:
src/authentication/utils.pysrc/authentication/api_key_token.pysrc/authentication/noop.pysrc/metrics/recording.pysrc/authentication/noop_with_token.pysrc/metrics/__init__.pysrc/authentication/k8s.pysrc/authentication/rh_identity.pysrc/authentication/jwk_token.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/unit/**/*.py: Use pytest for all unit and integration tests
Usepytest-mockfor AsyncMock objects in unit tests
Use markerpytest.mark.asynciofor async tests
Files:
tests/unit/authentication/test_utils.pytests/unit/authentication/test_k8s.pytests/unit/authentication/test_jwk_token.pytests/unit/metrics/test_recording.pytests/unit/authentication/test_rh_identity.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Do not use unittest - pytest is the standard testing framework for this project
Files:
tests/unit/authentication/test_utils.pytests/unit/authentication/test_k8s.pytests/unit/authentication/test_jwk_token.pytests/unit/metrics/test_recording.pytests/unit/authentication/test_rh_identity.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles contain brief package descriptions
Files:
src/metrics/__init__.py
🧠 Learnings (5)
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests
Applied to files:
tests/unit/authentication/test_utils.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
src/authentication/noop.pysrc/authentication/noop_with_token.pytests/unit/authentication/test_rh_identity.py
📚 Learning: 2026-04-13T13:34:51.052Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:377-381
Timestamp: 2026-04-13T13:34:51.052Z
Learning: In `tests/e2e/features/environment.py` (lightspeed-core/lightspeed-stack), the hardcoded token `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva` is intentionally a dummy/truncated JWT (the canonical jwt.io documentation example, missing its signature segment). It is not a real credential and does not need to be replaced or loaded from an environment variable. Only a scanner-suppression comment may be warranted if CI noise becomes an issue.
Applied to files:
tests/unit/authentication/test_jwk_token.py
📚 Learning: 2026-04-27T23:16:37.085Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1606
File: src/authentication/k8s.py:444-447
Timestamp: 2026-04-27T23:16:37.085Z
Learning: In `src/authentication/k8s.py`, the metrics skip check (`skip_for_metrics`) intentionally uses `request.url.path in ("/metrics",)` (exact match via `in`) to stay consistent with the adjacent health probe skip pattern `request.url.path in ("/readiness", "/liveness")`. This differs from `rh_identity.py` which uses `endswith("/metrics")`, but the intra-file consistency was a deliberate style choice by the author.
Applied to files:
tests/unit/authentication/test_rh_identity.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.
Applied to files:
tests/unit/authentication/test_rh_identity.pysrc/authentication/rh_identity.py
🔇 Additional comments (5)
tests/unit/authentication/test_jwk_token.py (1)
507-527: Nice coverage for error telemetry and exception-chain behavior.These additions validate both the unexpected-error metrics path and preserved root-cause chaining for invalid required claims.
Also applies to: 594-610
src/authentication/utils.py (1)
44-66: Shared auth metric helper is implemented safely.Good call on isolating metric write failures from request handling.
tests/unit/authentication/test_k8s.py (1)
1141-1201: Good parametrized coverage for API error mapping + metric emission.This closes an important observability/error-handling gap for
_create_subject_access_review.tests/unit/metrics/test_recording.py (1)
165-247: Auth metric recording test coverage is solid.Nice balance across success, normalization, and logger-backed failure paths.
src/authentication/noop.py (1)
54-67: No-op auth instrumentation is clean and behavior-preserving.The new failure/skipped metric emission paths are correctly placed.
046a1a5 to
0388b99
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/authentication/rh_identity.py (1)
380-406:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThe new identity guard still misses nested non-dict payloads.
Line 381 only validates the top-level JSON object. Payloads like
{"identity": 1}or{"identity": {"type": "User", "user": 1}}still raiseTypeErrorinsideRHIdentityData, bypass thisexcept HTTPException, and return a 500 without theinvalid_identitymetric.Proposed fix
- if not isinstance(identity_data, dict): + if ( + not isinstance(identity_data, dict) + or not isinstance(identity_data.get("identity"), dict) + or ( + identity_data["identity"].get("type") == "User" + and not isinstance(identity_data["identity"].get("user"), dict) + ) + or ( + identity_data["identity"].get("type") == "System" + and not isinstance(identity_data["identity"].get("system"), dict) + ) + ): logger.warning( - "x-rh-identity decoded to non-dict type: %s", + "x-rh-identity decoded to invalid structure: %s", type(identity_data).__name__, ) _record_rh_identity_auth("failure", "invalid_identity", start_time) raise HTTPException( status_code=400,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/authentication/rh_identity.py` around lines 380 - 406, The top-level guard only checks identity_data is a dict but misses nested non-dict shapes that cause TypeError inside RHIdentityData and surface as 500s; before constructing RHIdentityData (or immediately after), validate that identity_data.get("identity") is a dict and that nested expected fields (e.g., "type", "user", "account") are the correct types, and if not call _record_rh_identity_auth("failure","invalid_identity", start_time) and raise HTTPException(400, detail="Invalid identity data in x-rh-identity header"); alternatively, wrap the RHIdentityData(...) and rh_identity.validate_entitlements() calls in a broader except Exception as e that converts TypeError/ValueError into an HTTPException with the above metric recording so nested shape errors do not produce 500s.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/authentication/k8s.py`:
- Around line 431-493: The call to K8sClientSingleton.get_authz_api() can raise
during client init and is currently caught by the broad Exception handler
(mislabeling it as an unexpected error); wrap the get_authz_api() call in its
own try/except that catches ApiException (and generic client-init exceptions if
needed) and handle them like Kubernetes API outages: call
record_auth_metrics(AUTH_MOD_K8S, "failure", "authorization_check_error",
start_time), log the error, construct a ServiceUnavailableResponse
(backend_name="Kubernetes API", cause including e.reason or str(e)) and raise
HTTPException(**response.model_dump()) so client-init failures are classified as
service unavailable rather than unexpected_error, leaving the subsequent
SubjectAccessReview creation/error handling unchanged.
---
Duplicate comments:
In `@src/authentication/rh_identity.py`:
- Around line 380-406: The top-level guard only checks identity_data is a dict
but misses nested non-dict shapes that cause TypeError inside RHIdentityData and
surface as 500s; before constructing RHIdentityData (or immediately after),
validate that identity_data.get("identity") is a dict and that nested expected
fields (e.g., "type", "user", "account") are the correct types, and if not call
_record_rh_identity_auth("failure","invalid_identity", start_time) and raise
HTTPException(400, detail="Invalid identity data in x-rh-identity header");
alternatively, wrap the RHIdentityData(...) and
rh_identity.validate_entitlements() calls in a broader except Exception as e
that converts TypeError/ValueError into an HTTPException with the above metric
recording so nested shape errors do not produce 500s.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c1eab94e-5401-4574-a556-c43d5bb15b54
📒 Files selected for processing (14)
src/authentication/api_key_token.pysrc/authentication/jwk_token.pysrc/authentication/k8s.pysrc/authentication/noop.pysrc/authentication/noop_with_token.pysrc/authentication/rh_identity.pysrc/authentication/utils.pysrc/metrics/__init__.pysrc/metrics/recording.pytests/unit/authentication/test_jwk_token.pytests/unit/authentication/test_k8s.pytests/unit/authentication/test_rh_identity.pytests/unit/authentication/test_utils.pytests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: spectral
- GitHub Check: Pylinter
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax:str | int
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes use ABC with@abstractmethoddecorators
Complete type annotations for all class attributes, use specific types, notAny
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection
Files:
src/authentication/utils.pysrc/authentication/noop.pysrc/authentication/api_key_token.pysrc/metrics/__init__.pysrc/authentication/noop_with_token.pysrc/authentication/jwk_token.pysrc/authentication/k8s.pysrc/authentication/rh_identity.pysrc/metrics/recording.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/unit/**/*.py: Use pytest for all unit and integration tests
Usepytest-mockfor AsyncMock objects in unit tests
Use markerpytest.mark.asynciofor async tests
Files:
tests/unit/authentication/test_utils.pytests/unit/authentication/test_k8s.pytests/unit/authentication/test_jwk_token.pytests/unit/metrics/test_recording.pytests/unit/authentication/test_rh_identity.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Do not use unittest - pytest is the standard testing framework for this project
Files:
tests/unit/authentication/test_utils.pytests/unit/authentication/test_k8s.pytests/unit/authentication/test_jwk_token.pytests/unit/metrics/test_recording.pytests/unit/authentication/test_rh_identity.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles contain brief package descriptions
Files:
src/metrics/__init__.py
🧠 Learnings (7)
📚 Learning: 2026-04-13T13:34:51.052Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:377-381
Timestamp: 2026-04-13T13:34:51.052Z
Learning: In `tests/e2e/features/environment.py` (lightspeed-core/lightspeed-stack), the hardcoded token `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva` is intentionally a dummy/truncated JWT (the canonical jwt.io documentation example, missing its signature segment). It is not a real credential and does not need to be replaced or loaded from an environment variable. Only a scanner-suppression comment may be warranted if CI noise becomes an issue.
Applied to files:
tests/unit/authentication/test_jwk_token.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
src/authentication/noop.pytests/unit/authentication/test_rh_identity.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/**/*.py : Use Final[type] as type hint for all constants
Applied to files:
src/metrics/__init__.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/constants.py : Central `constants.py` for shared constants with descriptive comments
Applied to files:
src/metrics/__init__.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Check `constants.py` for shared constants before defining new ones
Applied to files:
src/metrics/__init__.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.
Applied to files:
src/authentication/rh_identity.pytests/unit/authentication/test_rh_identity.pysrc/metrics/recording.py
📚 Learning: 2026-04-27T23:16:37.085Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1606
File: src/authentication/k8s.py:444-447
Timestamp: 2026-04-27T23:16:37.085Z
Learning: In `src/authentication/k8s.py`, the metrics skip check (`skip_for_metrics`) intentionally uses `request.url.path in ("/metrics",)` (exact match via `in`) to stay consistent with the adjacent health probe skip pattern `request.url.path in ("/readiness", "/liveness")`. This differs from `rh_identity.py` which uses `endswith("/metrics")`, but the intra-file consistency was a deliberate style choice by the author.
Applied to files:
tests/unit/authentication/test_rh_identity.py
🔇 Additional comments (10)
tests/unit/authentication/test_jwk_token.py (2)
507-527: Strong coverage for unexpected-error metrics in JWK auth path.This test cleanly verifies the failure metric payload and preserves the original exception flow.
594-610: Exception-chain assertion is a valuable regression guard.Verifying
__cause__here protects root-cause visibility for invalid-claim failures.src/authentication/utils.py (1)
44-66:record_auth_metricsimplementation is robust and aligned with non-disruptive telemetry goals.The helper correctly keeps auth execution safe even if metric recording fails.
tests/unit/authentication/test_utils.py (1)
47-92: Good focused unit coverage for the shared auth-metrics helper.The added cases validate both duration math and graceful degradation on recorder failures.
src/authentication/noop.py (1)
54-67: No-op auth metrics instrumentation is well-placed and behavior-preserving.Failure and skipped outcomes are both captured with consistent timing context.
tests/unit/metrics/test_recording.py (1)
221-296: Nice expansion of bounded-label and failure-path coverage for auth metric recorders.These tests provide solid guardrails for normalization and logging behavior.
tests/unit/authentication/test_k8s.py (1)
1141-1201: Good direct coverage of SubjectAccessReview error mapping plus metric emission.The new parametrized assertions strengthen confidence in K8s failure classification.
src/authentication/api_key_token.py (2)
67-77: Great fix for differentiating token extraction failure reasons in metrics.This improves auth telemetry fidelity without altering exception behavior.
83-92: Success/failure metric hooks around API-key validation look correct.The placement ensures invalid-key and valid-key outcomes are both consistently timed and recorded.
tests/unit/authentication/test_rh_identity.py (1)
507-640: Path-parametrization updates are well aligned with normalized skip-path logic.Including trailing-slash and non-skip variants improves regression safety for probe/metrics bypass behavior.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
src/authentication/rh_identity.py (1)
414-427:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard nested container types before constructing
RHIdentityData.The new top-level
dictcheck still lets payloads like{"identity": 1}or{"identity": {"type": "User", "user": []}}raiseTypeErrorinsideRHIdentityData. Those bypass thisexcept HTTPExceptionbranch, return 500, and skip theinvalid_identitymetric.Suggested fix
try: + identity = identity_data.get("identity") + if not isinstance(identity, dict): + raise HTTPException( + status_code=400, + detail="Invalid identity data in x-rh-identity header", + ) + if identity.get("type") == "User" and not isinstance( + identity.get("user"), dict + ): + raise HTTPException( + status_code=400, + detail="Invalid identity data in x-rh-identity header", + ) + if identity.get("type") == "System" and not isinstance( + identity.get("system"), dict + ): + raise HTTPException( + status_code=400, + detail="Invalid identity data in x-rh-identity header", + ) rh_identity = RHIdentityData( identity_data, required_entitlements=self.required_entitlements, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/authentication/rh_identity.py` around lines 414 - 427, Before constructing RHIdentityData, explicitly validate that identity_data is a dict and that identity_data.get("identity") is also a dict (and any nested container fields RHIdentityData expects, e.g., "user" or "org", are dicts when present); if these checks fail, call _record_rh_identity_auth("failure", "invalid_identity", start_time) and raise an HTTPException (non-403 status) so the existing except HTTPException branch runs; update the block around RHIdentityData(...) and rh_identity.validate_entitlements() to perform these isinstance checks and short-circuit with the metric + HTTPException on type-mismatch to avoid internal TypeError and 500s.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/authentication/jwk_token.py`:
- Around line 216-223: The validator currently only rejects empty strings but
accepts whitespace-only claims; update the check in the function handling claim
validation (the block that calls record_auth_metrics/AUTH_MOD_JWK_TOKEN and
raises UnauthorizedResponse/HTTPException) to treat strings that are all
whitespace as invalid by testing e.g. value.strip() before accepting it, so that
whitespace-only values trigger the same invalid_claim metric and raise the same
UnauthorizedResponse/HTTPException as empty strings.
In `@src/authentication/noop_with_token.py`:
- Around line 72-78: The catch block for extract_user_token(request.headers)
should preserve the exception's detail instead of always using "missing_token":
catch the HTTPException as e in the try/except around extract_user_token, derive
a failure_reason from e.detail (or str(e) if detail missing) and pass that into
record_auth_metrics(AUTH_MOD_NOOP_WITH_TOKEN, "failure", failure_reason,
start_time), then re-raise the original exception so the header-vs-malformed
distinction is retained; reference extract_user_token, record_auth_metrics,
AUTH_MOD_NOOP_WITH_TOKEN, request.headers, and start_time when making the
change.
In `@src/authentication/utils.py`:
- Around line 44-66: Update the public function record_auth_metrics docstring to
follow Google Python style: replace the "Args:" section with a "Parameters"
section listing auth_module (str), result (str), reason (str), and start_time
(float) with concise descriptions, add a "Returns" section documenting that it
returns None, and optionally include a "Raises" note only if the function may
raise exceptions; keep the existing overall one-sentence summary and the
parameter descriptions aligned with the rest of the repo's docstrings for
consistency.
---
Duplicate comments:
In `@src/authentication/rh_identity.py`:
- Around line 414-427: Before constructing RHIdentityData, explicitly validate
that identity_data is a dict and that identity_data.get("identity") is also a
dict (and any nested container fields RHIdentityData expects, e.g., "user" or
"org", are dicts when present); if these checks fail, call
_record_rh_identity_auth("failure", "invalid_identity", start_time) and raise an
HTTPException (non-403 status) so the existing except HTTPException branch runs;
update the block around RHIdentityData(...) and
rh_identity.validate_entitlements() to perform these isinstance checks and
short-circuit with the metric + HTTPException on type-mismatch to avoid internal
TypeError and 500s.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6f170dd6-9029-4db1-8967-bf72610f16a2
📒 Files selected for processing (14)
src/authentication/api_key_token.pysrc/authentication/jwk_token.pysrc/authentication/k8s.pysrc/authentication/noop.pysrc/authentication/noop_with_token.pysrc/authentication/rh_identity.pysrc/authentication/utils.pysrc/metrics/__init__.pysrc/metrics/recording.pytests/unit/authentication/test_jwk_token.pytests/unit/authentication/test_k8s.pytests/unit/authentication/test_rh_identity.pytests/unit/authentication/test_utils.pytests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.13)
- GitHub Check: unit_tests (3.12)
- GitHub Check: Pylinter
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax:str | int
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes use ABC with@abstractmethoddecorators
Complete type annotations for all class attributes, use specific types, notAny
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection
Files:
src/authentication/utils.pysrc/authentication/api_key_token.pysrc/authentication/noop.pysrc/authentication/noop_with_token.pysrc/metrics/__init__.pysrc/authentication/jwk_token.pysrc/metrics/recording.pysrc/authentication/rh_identity.pysrc/authentication/k8s.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/unit/**/*.py: Use pytest for all unit and integration tests
Usepytest-mockfor AsyncMock objects in unit tests
Use markerpytest.mark.asynciofor async tests
Files:
tests/unit/authentication/test_jwk_token.pytests/unit/metrics/test_recording.pytests/unit/authentication/test_k8s.pytests/unit/authentication/test_utils.pytests/unit/authentication/test_rh_identity.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Do not use unittest - pytest is the standard testing framework for this project
Files:
tests/unit/authentication/test_jwk_token.pytests/unit/metrics/test_recording.pytests/unit/authentication/test_k8s.pytests/unit/authentication/test_utils.pytests/unit/authentication/test_rh_identity.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles contain brief package descriptions
Files:
src/metrics/__init__.py
🧠 Learnings (6)
📚 Learning: 2026-04-13T13:34:51.052Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:377-381
Timestamp: 2026-04-13T13:34:51.052Z
Learning: In `tests/e2e/features/environment.py` (lightspeed-core/lightspeed-stack), the hardcoded token `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva` is intentionally a dummy/truncated JWT (the canonical jwt.io documentation example, missing its signature segment). It is not a real credential and does not need to be replaced or loaded from an environment variable. Only a scanner-suppression comment may be warranted if CI noise becomes an issue.
Applied to files:
tests/unit/authentication/test_jwk_token.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/**/*.py : Use Final[type] as type hint for all constants
Applied to files:
src/metrics/__init__.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.
Applied to files:
src/metrics/recording.pysrc/authentication/rh_identity.pytests/unit/authentication/test_rh_identity.py
📚 Learning: 2026-04-27T23:16:37.085Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1606
File: src/authentication/k8s.py:444-447
Timestamp: 2026-04-27T23:16:37.085Z
Learning: In `src/authentication/k8s.py`, the metrics skip check (`skip_for_metrics`) intentionally uses `request.url.path in ("/metrics",)` (exact match via `in`) to stay consistent with the adjacent health probe skip pattern `request.url.path in ("/readiness", "/liveness")`. This differs from `rh_identity.py` which uses `endswith("/metrics")`, but the intra-file consistency was a deliberate style choice by the author.
Applied to files:
tests/unit/authentication/test_rh_identity.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
tests/unit/authentication/test_rh_identity.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling
Applied to files:
src/authentication/k8s.py
🔇 Additional comments (11)
tests/unit/authentication/test_jwk_token.py (1)
16-16: Good addition: the new auth-module constant and tests line up with the shared metrics contract.The import keeps the assertions on the canonical module label, and the new cases cover both the unexpected extraction failure metric and the invalid-
user_idexception chain.Also applies to: 507-610
src/authentication/noop.py (1)
3-67: Looks good.The monotonic timer and
record_auth_metricscalls preserve the existing 400/return behavior while adding the requested observability.tests/unit/authentication/test_utils.py (1)
6-91: Solid coverage for the shared metrics helper.These tests validate the success/failure paths and the swallowed-error behavior without overfitting to the internal metric implementation.
src/authentication/api_key_token.py (1)
64-91: Nice split of extraction vs validation failures.The new metrics calls preserve the existing HTTPExceptions while giving the auth telemetry enough detail to separate missing headers, malformed tokens, and invalid API keys.
tests/unit/metrics/test_recording.py (1)
221-296: Good coverage for the new auth metric helpers.The added cases exercise both bounded-label normalization and the swallowed-error behavior that the auth wrappers depend on.
tests/unit/authentication/test_k8s.py (1)
11-24: Good addition of direct SubjectAccessReview coverage.The new import and tests pin the authz error mapping and the init-failure metric recording without changing the production flow.
Also applies to: 1141-1223
src/metrics/__init__.py (1)
98-111: Nice bounded metric surface.The new auth counter and histogram keep the label set small and line up with the recording helpers’
(auth_module, result[, reason])shape, which should keep the auth dashboards stable across backends.tests/unit/authentication/test_rh_identity.py (1)
507-645: Good skip-path regression coverage.Covering both canonical and trailing-slash probe/metrics routes, plus nearby non-skip routes, matches the normalized-path behavior and protects against false positives.
src/authentication/k8s.py (1)
557-610: Nice separation of K8S auth outcomes.The updated flow keeps skip, token-review, kube:admin, SAR, authorization, and success paths recording one bounded reason each, which makes the new auth metrics much easier to trust.
src/authentication/rh_identity.py (1)
47-63: Nice path normalization helper.Canonicalizing the request path before exact comparisons lets
/readiness/and/metrics/share the same skip behavior without widening the match beyond those endpoints.src/metrics/recording.py (1)
24-84: The normalization layer looks solid.Centralizing allowed auth results and reasons here keeps call sites simple while still bounding label cardinality before values reach Prometheus.
f85f0fa to
7916763
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/authentication/jwk_token.py`:
- Around line 290-296: The catch-all except for
extract_user_token(request.headers) currently records every failure as
"missing_token"; change the handler to capture the HTTPException (e.g., except
HTTPException as exc) and derive a meaningful reason from it (use exc.detail or
str(exc) or map exc.status_code to a reason) when calling
record_auth_metrics(AUTH_MOD_JWK_TOKEN, "failure", reason, start_time), then
re-raise the original exception; update the block around extract_user_token and
the record_auth_metrics invocation so malformed-scheme and missing-bearer are
recorded distinctly instead of always "missing_token".
In `@src/authentication/k8s.py`:
- Around line 575-579: The K8s path always records extract_user_token failures
as "missing_token" losing the original reason; change the except block to
capture the exception (except HTTPException as e), extract a preserved reason
from e.detail (or e.description) if present, fall back to "missing_token"
otherwise, then call record_auth_metrics(AUTH_MOD_K8S, "failure", reason,
start_time) before re-raising; update references to extract_user_token,
record_auth_metrics and AUTH_MOD_K8S accordingly so the original
"missing_header" reason is preserved.
In `@src/authentication/rh_identity.py`:
- Around line 432-445: The code calls
RHIdentityData(...).validate_entitlements() but malformed entitlements (e.g.
entitlements=[], or entries that are not mappings) can raise
AttributeError/TypeError and escape the HTTPException handler, causing 500s and
missing metrics; wrap or augment validation so type/shape errors are converted
into an HTTPException before they escape. Specifically, in the
RHIdentityData.validate_entitlements() call (or just before it) detect
non-mapping entitlements and non-mapping service entries and raise an
HTTPException (use the same 403 entitlement_missing semantics used in the except
block) so _record_rh_identity_auth("failure", "entitlement_missing", start_time)
runs and the raised HTTPException is propagated instead of an unhandled
AttributeError/TypeError.
- Around line 375-379: The skip-path logic should run unconditionally so
readiness/metrics bypass works even when an x-rh-identity header exists or is
malformed: call _get_auth_skip_tuple(request, start_time) immediately and if it
returns a non-None tuple return it before attempting to read or validate
identity_header; update the block around
identity_header/request.headers.get("x-rh-identity") and the auth_skip handling
so _get_auth_skip_tuple is invoked first and short-circuits processing when
appropriate.
In `@src/authentication/utils.py`:
- Around line 60-71: The current try block wraps both
recording.record_auth_attempt(...) and recording.record_auth_duration(...), so a
failure in record_auth_attempt prevents record_auth_duration from running;
change this by calling recording.record_auth_attempt(auth_module, result,
reason) inside its own try/except that logs via logger.warning (preserve
exc_info=True and the same message format), then separately call
recording.record_auth_duration(auth_module, result, time.monotonic() -
start_time) inside a second try/except with the same logging behavior; keep the
same parameter names (auth_module, result, reason, start_time) and message text
for consistency.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6c591d00-e3af-4078-8cb1-1f1ee5f6f8e4
📒 Files selected for processing (14)
src/authentication/api_key_token.pysrc/authentication/jwk_token.pysrc/authentication/k8s.pysrc/authentication/noop.pysrc/authentication/noop_with_token.pysrc/authentication/rh_identity.pysrc/authentication/utils.pysrc/metrics/__init__.pysrc/metrics/recording.pytests/unit/authentication/test_jwk_token.pytests/unit/authentication/test_k8s.pytests/unit/authentication/test_rh_identity.pytests/unit/authentication/test_utils.pytests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/authentication/api_key_token.pysrc/metrics/__init__.pysrc/authentication/utils.pysrc/authentication/k8s.pysrc/authentication/jwk_token.pysrc/metrics/recording.pysrc/authentication/noop.pysrc/authentication/noop_with_token.pysrc/authentication/rh_identity.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/authentication/test_jwk_token.pytests/unit/authentication/test_rh_identity.pytests/unit/authentication/test_utils.pytests/unit/metrics/test_recording.pytests/unit/authentication/test_k8s.py
src/**/__init__.py
📄 CodeRabbit inference engine (AGENTS.md)
Package
__init__.pyfiles must contain brief package descriptions
Files:
src/metrics/__init__.py
| try: | ||
| token = extract_user_token(request.headers) | ||
| except HTTPException: | ||
| record_auth_metrics(AUTH_MOD_K8S, "failure", "missing_token", start_time) | ||
| raise |
There was a problem hiding this comment.
Preserve missing_header in the K8s extraction path too.
This branch records every extract_user_token() failure as missing_token, so requests with no Authorization header lose the same distinction you already added in api_key_token.py. That makes the new per-reason auth dashboards inconsistent across auth modes.
Suggested change
- try:
- token = extract_user_token(request.headers)
- except HTTPException:
- record_auth_metrics(AUTH_MOD_K8S, "failure", "missing_token", start_time)
- raise
+ try:
+ token = extract_user_token(request.headers)
+ except HTTPException as exc:
+ reason = "missing_token"
+ if isinstance(exc.detail, dict) and "No Authorization header" in exc.detail.get(
+ "cause", ""
+ ):
+ reason = "missing_header"
+ record_auth_metrics(AUTH_MOD_K8S, "failure", reason, start_time)
+ raise📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| token = extract_user_token(request.headers) | |
| except HTTPException: | |
| record_auth_metrics(AUTH_MOD_K8S, "failure", "missing_token", start_time) | |
| raise | |
| try: | |
| token = extract_user_token(request.headers) | |
| except HTTPException as exc: | |
| reason = "missing_token" | |
| if isinstance(exc.detail, dict) and "No Authorization header" in exc.detail.get( | |
| "cause", "" | |
| ): | |
| reason = "missing_header" | |
| record_auth_metrics(AUTH_MOD_K8S, "failure", reason, start_time) | |
| raise |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/authentication/k8s.py` around lines 575 - 579, The K8s path always
records extract_user_token failures as "missing_token" losing the original
reason; change the except block to capture the exception (except HTTPException
as e), extract a preserved reason from e.detail (or e.description) if present,
fall back to "missing_token" otherwise, then call
record_auth_metrics(AUTH_MOD_K8S, "failure", reason, start_time) before
re-raising; update references to extract_user_token, record_auth_metrics and
AUTH_MOD_K8S accordingly so the original "missing_header" reason is preserved.
| try: | ||
| rh_identity = RHIdentityData( | ||
| identity_data, | ||
| required_entitlements=self.required_entitlements, | ||
| ) | ||
|
|
||
| # Validate entitlements if configured | ||
| rh_identity.validate_entitlements() | ||
| # Validate entitlements if configured | ||
| rh_identity.validate_entitlements() | ||
| except HTTPException as exc: | ||
| reason = ( | ||
| "entitlement_missing" if exc.status_code == 403 else "invalid_identity" | ||
| ) | ||
| _record_rh_identity_auth("failure", reason, start_time) | ||
| raise |
There was a problem hiding this comment.
Validate malformed entitlements payloads before they escape as 500s.
validate_entitlements() still assumes entitlements and each service entry are mappings. Payloads like "entitlements": [] or "entitlements": {"rhel": 1} will raise AttributeError here, bypass this except HTTPException, and miss the new failure metric.
Suggested fix
def has_entitlement(self, service: str) -> bool:
"""Check if user has a specific service entitlement."""
entitlements = self.identity_data.get("entitlements", {})
+ if not isinstance(entitlements, dict):
+ logger.warning(
+ "Identity validation failed: 'entitlements' is %s, expected dict",
+ type(entitlements).__name__,
+ )
+ raise HTTPException(status_code=400, detail="Invalid identity data")
+
service_entitlement = entitlements.get(service, {})
+ if not isinstance(service_entitlement, dict):
+ logger.warning(
+ "Identity validation failed: entitlement '%s' is %s, expected dict",
+ service,
+ type(service_entitlement).__name__,
+ )
+ raise HTTPException(status_code=400, detail="Invalid identity data")
+
return service_entitlement.get("is_entitled", False)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/authentication/rh_identity.py` around lines 432 - 445, The code calls
RHIdentityData(...).validate_entitlements() but malformed entitlements (e.g.
entitlements=[], or entries that are not mappings) can raise
AttributeError/TypeError and escape the HTTPException handler, causing 500s and
missing metrics; wrap or augment validation so type/shape errors are converted
into an HTTPException before they escape. Specifically, in the
RHIdentityData.validate_entitlements() call (or just before it) detect
non-mapping entitlements and non-mapping service entries and raise an
HTTPException (use the same 403 entitlement_missing semantics used in the except
block) so _record_rh_identity_auth("failure", "entitlement_missing", start_time)
runs and the raised HTTPException is propagated instead of an unhandled
AttributeError/TypeError.
| try: | ||
| recording.record_auth_attempt(auth_module, result, reason) | ||
| recording.record_auth_duration( | ||
| auth_module, result, time.monotonic() - start_time | ||
| ) | ||
| except Exception: # pylint: disable=broad-exception-caught | ||
| logger.warning( | ||
| "Failed to record authentication metrics for module %s with result %s", | ||
| auth_module, | ||
| result, | ||
| exc_info=True, | ||
| ) |
There was a problem hiding this comment.
Isolate attempt and duration recording failures.
If record_auth_attempt() raises unexpectedly here, record_auth_duration() is skipped for the same request because both calls share one try block. That leaves you with neither series on the exact fallback path this helper is supposed to protect. Split these into independent guarded calls so one recorder failure does not suppress the other.
Suggested change
- try:
- recording.record_auth_attempt(auth_module, result, reason)
- recording.record_auth_duration(
- auth_module, result, time.monotonic() - start_time
- )
- except Exception: # pylint: disable=broad-exception-caught
- logger.warning(
- "Failed to record authentication metrics for module %s with result %s",
- auth_module,
- result,
- exc_info=True,
- )
+ duration = time.monotonic() - start_time
+
+ try:
+ recording.record_auth_attempt(auth_module, result, reason)
+ except Exception: # pylint: disable=broad-exception-caught
+ logger.warning(
+ "Failed to record authentication attempt metric for module %s with result %s",
+ auth_module,
+ result,
+ exc_info=True,
+ )
+
+ try:
+ recording.record_auth_duration(auth_module, result, duration)
+ except Exception: # pylint: disable=broad-exception-caught
+ logger.warning(
+ "Failed to record authentication duration metric for module %s with result %s",
+ auth_module,
+ result,
+ exc_info=True,
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/authentication/utils.py` around lines 60 - 71, The current try block
wraps both recording.record_auth_attempt(...) and
recording.record_auth_duration(...), so a failure in record_auth_attempt
prevents record_auth_duration from running; change this by calling
recording.record_auth_attempt(auth_module, result, reason) inside its own
try/except that logs via logger.warning (preserve exc_info=True and the same
message format), then separately call
recording.record_auth_duration(auth_module, result, time.monotonic() -
start_time) inside a second try/except with the same logging behavior; keep the
same parameter names (auth_module, result, reason, start_time) and message text
for consistency.
Move telemetry functions from responses.py into a dedicated responses_telemetry.py module. Centralize fire-and-forget dispatch logic in observability/splunk.py as dispatch_splunk_event(). No behavioral changes to request handling. Reduces responses.py from 1201 to 1057 lines. Signed-off-by: Major Hayden <major@redhat.com>
…t restores llama-stack (lightspeed-core#1628) * Add diagnostic pod logs on e2e failure and remove disrupt-once optimization * Increase vLLM max-model-len to 35936 (GPU memory limit) * Accept 503 as valid port-forward proof in e2e connectivity check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Major Hayden <major@redhat.com>
|
Superseded by #1640 which now contains all three metrics commits as a stack. |
Description
Adds bounded Prometheus metrics for authentication attempts and latency across the configured auth dependencies. This keeps auth SLO dashboards consistent across noop, token, JWK, Kubernetes, API key, and rh-identity modes without adding user identifiers as labels.
Type of change
Tools used to create PR
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Tests