Skip to content

RSPEED-2943: add quota monitoring metrics#1640

Open
major wants to merge 1 commit intolightspeed-core:mainfrom
major:split/quota-metrics
Open

RSPEED-2943: add quota monitoring metrics#1640
major wants to merge 1 commit intolightspeed-core:mainfrom
major:split/quota-metrics

Conversation

@major
Copy link
Copy Markdown
Contributor

@major major commented Apr 29, 2026

Description

Adds bounded Prometheus metrics for pre-request quota checks on /v1/responses and /v1/infer. The labels capture endpoint, quota type, and result without exposing the actual quota subject identifier.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

  • Assisted-by: OpenCode
  • Generated by: N/A

Related Tickets & Documents

  • Related Issue # RSPEED-2943
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • uv run pytest tests/unit/metrics/test_recording.py tests/unit/app/endpoints/test_responses.py tests/unit/app/endpoints/test_rlsapi_v1.py
  • uv run radon cc -s src/app/endpoints/responses.py src/app/endpoints/rlsapi_v1.py src/metrics/recording.py
  • uv run make verify

Summary by CodeRabbit

  • Chores

    • Enhanced quota check monitoring and metrics collection across the system. Quota operations now include instrumentation to track performance and outcomes, improving system observability and reliability.
  • Tests

    • Added comprehensive unit tests for quota check instrumentation and metrics recording to ensure accuracy of collected data.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Warning

Rate limit exceeded

@major has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 19 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a7f98123-7951-4e63-84fe-f0a441fc54a0

📥 Commits

Reviewing files that changed from the base of the PR and between 7668c4b and 6971abe.

📒 Files selected for processing (7)
  • src/app/endpoints/responses.py
  • src/app/endpoints/rlsapi_v1.py
  • src/metrics/__init__.py
  • src/metrics/recording.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/app/endpoints/test_rlsapi_v1.py
  • tests/unit/metrics/test_recording.py

Walkthrough

This PR adds quota-check instrumentation across endpoints by introducing dedicated quota-check helper functions that measure check latency and record bounded metrics labeled by endpoint path, quota type, and result status. Metrics infrastructure includes new Prometheus Counter and Histogram with normalized label values for quota types and result outcomes.

Changes

Cohort / File(s) Summary
Endpoint Quota Instrumentation
src/app/endpoints/responses.py, src/app/endpoints/rlsapi_v1.py
Introduces _check_response_quota and _check_infer_quota helpers that wrap quota availability checks, record metrics with elapsed time and normalized labels, and re-raise original exceptions while preserving error behavior.
Metrics Infrastructure
src/metrics/__init__.py, src/metrics/recording.py
Defines new Prometheus Counter and Histogram metrics for quota checks with sub-second buckets. Adds quota type and result label normalization constants and helper functions to map out-of-set inputs to bounded fallback labels.
Test Coverage
tests/unit/app/endpoints/test_responses.py, tests/unit/metrics/test_recording.py
Adds unit tests for quota-check helper exception handling and metric recording with expected labels, plus parametrized tests for metric recording success/failure paths and label normalization behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding quota monitoring metrics, which is the primary objective across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@major major mentioned this pull request Apr 29, 2026
19 tasks
@major major force-pushed the split/quota-metrics branch from 2221254 to 7668c4b Compare April 30, 2026 01:47
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 535-562: The _check_infer_quota function currently logs raw
strings ("skipped","failure","success") and only catches HTTPException; change
it to use the exported recording constants (e.g.,
recording.QUOTA_RESULT_SKIPPED, recording.QUOTA_RESULT_FAILURE,
recording.QUOTA_RESULT_SUCCESS) in the three record_quota_check calls, and add a
broad except Exception block around check_tokens_available that records
recording.QUOTA_RESULT_ERROR with the elapsed time before re-raising the
exception; keep the existing HTTPException except as-is but use the constant
there too. Use the existing symbols _check_infer_quota, check_tokens_available,
and recording.record_quota_check to locate and modify the code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e9a8ec74-e971-491a-8e93-5dd57f734cd2

📥 Commits

Reviewing files that changed from the base of the PR and between ca125c4 and 7668c4b.

📒 Files selected for processing (6)
  • src/app/endpoints/responses.py
  • src/app/endpoints/rlsapi_v1.py
  • src/metrics/__init__.py
  • src/metrics/recording.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/metrics/test_recording.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Pylinter
  • GitHub Check: mypy
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (6)
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/metrics/test_recording.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/metrics/test_recording.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

  • src/app/endpoints/rlsapi_v1.py
  • src/metrics/__init__.py
  • src/app/endpoints/responses.py
  • src/metrics/recording.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends

Files:

  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/responses.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/responses.py
src/**/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Package __init__.py files contain brief package descriptions

Files:

  • src/metrics/__init__.py
🧠 Learnings (2)
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

  • src/app/endpoints/rlsapi_v1.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/rlsapi_v1.py
  • src/app/endpoints/responses.py
🔇 Additional comments (8)
tests/unit/app/endpoints/test_responses.py (1)

223-244: LGTM!

The test correctly verifies that unexpected quota backend failures (RuntimeError) are re-raised after recording metrics with the appropriate arguments (endpoint_path="/v1/responses", quota_type="user_id", result="error", and a non-negative duration).

src/app/endpoints/responses.py (2)

141-170: LGTM!

The quota check wrapper correctly:

  • Times the quota check using time.monotonic()
  • Records metrics with bounded labels before re-raising exceptions
  • Uses the exported constants for type safety (QUOTA_TYPE_USER_ID, QUOTA_RESULT_*)
  • Handles both expected (HTTPException) and unexpected (Exception) quota backend failures

311-316: LGTM!

Moving endpoint_path assignment earlier and using it consistently for quota metrics is clean.

src/metrics/__init__.py (2)

11-24: LGTM!

The bucket distribution covers sub-second latencies well (1ms to 5s), which is appropriate for quota backend checks. Including float("inf") as the final bucket follows Prometheus histogram best practices.


76-92: LGTM!

The quota metrics are well-designed:

  • Labels are documented as bounded for cardinality safety
  • Counter and Histogram share the same label set for consistent querying
  • Descriptive metric names follow the existing ls_ prefix convention
tests/unit/metrics/test_recording.py (1)

165-218: LGTM!

Excellent test coverage for the new record_quota_check function:

  • The fixture cleanly patches the logger for failure assertions
  • Parametrized test efficiently covers both counter and histogram failure paths
  • The bounds test verifies that unexpected inputs are normalized to safe fallback values ("customer-123" → "user_id", "timeout" → "error")
src/metrics/recording.py (2)

17-55: LGTM!

Well-designed label bounding:

  • Constants provide a single source of truth for valid label values
  • frozenset ensures immutability of the allowed sets
  • Normalization functions guarantee bounded cardinality for Prometheus metrics
  • Fallback values ("user_id" and "error") are sensible defaults

155-177: LGTM!

The record_quota_check helper follows the established pattern from other recording functions:

  • Normalizes inputs before recording to prevent cardinality explosion
  • Updates both counter and histogram atomically
  • Catches expected metric update exceptions and logs them without disrupting the request

Comment thread src/app/endpoints/rlsapi_v1.py
@major major force-pushed the split/quota-metrics branch from 7668c4b to a45bbd9 Compare April 30, 2026 21:19
Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ignore docs/demos directory completely - the sources are incorrect there on purpose.

@major major force-pushed the split/quota-metrics branch 2 times, most recently from c64fde1 to 17bdda5 Compare May 5, 2026 13:21
@major major requested a review from tisnik May 5, 2026 19:58
Signed-off-by: Major Hayden <major@redhat.com>
@major major force-pushed the split/quota-metrics branch from 17bdda5 to 6971abe Compare May 7, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants