RSPEED-2943: add authorization monitoring metrics by major · Pull Request #1639 · lightspeed-core/lightspeed-stack

major · 2026-04-29T23:23:02Z

Stack: 2/3 - Depends on #1638. Merge after #1638, then merge #1640.

Description

Adds bounded Prometheus metrics for authorization checks and latency by protected action. This gives SLO dashboards visibility into authorization success, denial, and unexpected error paths without adding user or request identifiers as labels.

Type of change

Tools used to create PR

Assisted-by: OpenCode
Generated by: N/A

Related Tickets & Documents

Related Issue # RSPEED-2943
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

`uv run pytest tests/unit/metrics/test_recording.py tests/unit/authorization/test_middleware.py`
`uv run radon cc -s src/authorization/middleware.py src/metrics/recording.py`
`uv run make verify`

coderabbitai · 2026-04-29T23:23:07Z

Warning

Rate limit exceeded

@major has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 49 minutes and 28 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b02527bf-e174-47ff-8edf-8d2f12aceabe

📥 Commits

Reviewing files that changed from the base of the PR and between 4d7fe24 and 4cfe046.

📒 Files selected for processing (26)

src/app/endpoints/responses.py
src/app/endpoints/responses_telemetry.py
src/authentication/api_key_token.py
src/authentication/jwk_token.py
src/authentication/k8s.py
src/authentication/noop.py
src/authentication/noop_with_token.py
src/authentication/rh_identity.py
src/authentication/utils.py
src/authorization/middleware.py
src/metrics/__init__.py
src/metrics/recording.py
src/observability/__init__.py
src/observability/splunk.py
tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml
tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml
tests/e2e-prow/rhoai/scripts/e2e-ops.sh
tests/e2e/features/environment.py
tests/unit/app/endpoints/test_responses_splunk.py
tests/unit/authentication/test_jwk_token.py
tests/unit/authentication/test_k8s.py
tests/unit/authentication/test_rh_identity.py
tests/unit/authentication/test_utils.py
tests/unit/authorization/test_middleware.py
tests/unit/metrics/test_recording.py
tests/unit/observability/test_splunk.py

Walkthrough

Authorization latency metrics are now measured and recorded throughout the authorization middleware. A new metrics module component provides helper functions to normalize authorization labels and safely record both authorization check counts and duration observations. Metrics are recorded in a finally block to ensure emission even when authorization errors occur.

Changes

Cohort / File(s)	Summary
Authorization Metrics Recording `src/authorization/middleware.py`	Adds latency measurement and metric recording to authorization checks via new `_record_authorization_metrics` helper; records check results and duration in `finally` block to ensure metrics are emitted even on errors, including when authorization is denied or missing auth occurs.
Metrics Infrastructure `src/metrics/__init__.py`	Introduces Prometheus metrics for authorization: `authorization_checks_total` counter and `authorization_duration_seconds` histogram tracking authorization attempts by action and result, with configurable duration buckets.
Metrics Recording Helpers `src/metrics/recording.py`	Adds authorization-specific recording functions that normalize action/result labels to bounded sets, record checks and duration to metrics, and gracefully handle metric recording exceptions with warning logs.
Test Coverage `tests/unit/authorization/test_middleware.py`, `tests/unit/metrics/test_recording.py`	Unit tests validating that metric recording failures do not disrupt authorization flow, recording helpers normalize unexpected labels, and exceptions are logged without propagation.

Sequence Diagram

sequenceDiagram
    participant Client
    participant AuthMiddleware as Auth Middleware
    participant RoleResolver as Role Resolver
    participant MetricsRecorder as Metrics Recorder
    participant Prometheus as Prometheus Metrics

    Client->>AuthMiddleware: Authorization Request
    activate AuthMiddleware
    
    Note over AuthMiddleware: Start timing (monotonic)
    
    AuthMiddleware->>RoleResolver: Resolve Role
    activate RoleResolver
    RoleResolver-->>AuthMiddleware: Role/Result
    deactivate RoleResolver
    
    alt Authorization Success
        AuthMiddleware->>AuthMiddleware: Set result = "success"
    else Authorization Denied
        AuthMiddleware->>AuthMiddleware: Set result = "denied"
        Note over AuthMiddleware: Raise 403
    end
    
    AuthMiddleware->>MetricsRecorder: Record Check (action, result)
    activate MetricsRecorder
    MetricsRecorder->>Prometheus: Increment Counter
    Prometheus-->>MetricsRecorder: OK
    deactivate MetricsRecorder
    
    AuthMiddleware->>MetricsRecorder: Record Duration (action, result, latency)
    activate MetricsRecorder
    MetricsRecorder->>Prometheus: Observe Histogram
    Prometheus-->>MetricsRecorder: OK
    deactivate MetricsRecorder
    
    deactivate AuthMiddleware
    AuthMiddleware-->>Client: Response/Exception

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and clearly summarizes the main change: adding authorization monitoring metrics. It is concise, specific, and matches the core objective described in the PR description.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/metrics/__init__.py`:
- Around line 78-90: The new module-level metrics authorization_checks_total and
authorization_duration_seconds must be annotated as constants using Final;
import Final from typing if not present and add type annotations like
Final[Counter] for authorization_checks_total and Final[Histogram] for
authorization_duration_seconds so they follow the project's constant-typing
rule; leave the instantiation expressions unchanged and ensure the import of
Final is added at the top of the module if missing.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7933fdef-3790-4918-b1ff-b17e571bddad

📥 Commits

Reviewing files that changed from the base of the PR and between ca125c4 and 4d7fe24.

📒 Files selected for processing (5)

src/authorization/middleware.py
src/metrics/__init__.py
src/metrics/recording.py
tests/unit/authorization/test_middleware.py
tests/unit/metrics/test_recording.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: Pylinter
GitHub Check: integration_tests (3.12)
GitHub Check: radon
GitHub Check: Pyright
GitHub Check: build-pr
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

📓 Path-based instructions (4)

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

tests/unit/authorization/test_middleware.py
tests/unit/metrics/test_recording.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

tests/unit/authorization/test_middleware.py
tests/unit/metrics/test_recording.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

src/metrics/__init__.py
src/authorization/middleware.py
src/metrics/recording.py

src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files contain brief package descriptions

Files:

src/metrics/__init__.py

🔇 Additional comments (4)

tests/unit/authorization/test_middleware.py (1)

325-355: Good resilience coverage for metric failure in the success path.

This test correctly verifies that authorization success is not masked by metric failures and that warning logging plus duration recording still occur.

tests/unit/metrics/test_recording.py (1)

191-290: Strong coverage for authorization recorder behavior.

The new parametrized tests and normalization assertions validate both happy-path recording and swallowed metric-update failures cleanly.

src/authorization/middleware.py (1)

32-55: Resilient instrumentation in finally is implemented correctly.

The result-state transitions and isolated metric-recording error handling preserve authorization outcomes while still emitting observability data for all paths.

Also applies to: 154-195

src/metrics/recording.py (1)

18-33: Bounded-label normalization and record helpers look solid.

The new normalization + recorder functions are clear, type-annotated, and aligned with the intended bounded-label metric strategy.

Also applies to: 35-61, 160-195

Move telemetry functions from responses.py into a dedicated responses_telemetry.py module. Centralize fire-and-forget dispatch logic in observability/splunk.py as dispatch_splunk_event(). No behavioral changes to request handling. Reduces responses.py from 1201 to 1057 lines. Signed-off-by: Major Hayden <major@redhat.com>

…t restores llama-stack (lightspeed-core#1628) * Add diagnostic pod logs on e2e failure and remove disrupt-once optimization * Increase vLLM max-model-len to 35936 (GPU memory limit) * Accept 503 as valid port-forward proof in e2e connectivity check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: Major Hayden <major@redhat.com>

major · 2026-05-08T16:15:41Z

Superseded by #1640 which now contains all three metrics commits as a stack.

major mentioned this pull request Apr 29, 2026

RSPEED-2943: add SLO monitoring metrics #1637

Closed

19 tasks

major force-pushed the split/authz-metrics branch from 0469516 to 4d7fe24 Compare April 30, 2026 02:48

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread src/metrics/__init__.py Outdated

major force-pushed the split/authz-metrics branch 3 times, most recently from 48e2c0b to 4569392 Compare May 5, 2026 13:21

major force-pushed the split/authz-metrics branch from 4569392 to 91ff835 Compare May 8, 2026 12:09

This was referenced May 8, 2026

RSPEED-2943: add auth monitoring metrics #1638

Closed

RSPEED-2943: add observability metrics for auth, authorization, and quota #1640

Open

major and others added 4 commits May 8, 2026 07:18

feat: add auth monitoring metrics

61ecfff

Signed-off-by: Major Hayden <major@redhat.com>

feat: add authorization monitoring metrics

4cfe046

Signed-off-by: Major Hayden <major@redhat.com>

major force-pushed the split/authz-metrics branch from 91ff835 to 4cfe046 Compare May 8, 2026 12:19

major closed this May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSPEED-2943: add authorization monitoring metrics#1639

RSPEED-2943: add authorization monitoring metrics#1639
major wants to merge 4 commits intolightspeed-core:mainfrom
major:split/authz-metrics

major commented Apr 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

major commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

major commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

major commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

major commented Apr 29, 2026 •

edited

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading