Skip to content

Conversation

@sfc-gh-bnisco
Copy link
Contributor

@sfc-gh-bnisco sfc-gh-bnisco commented Jan 13, 2026

TL;DR

Add component directory infrastructure with scripts for validation, enrichment, and ranking.

What changed?

  • Added .gitattributes to mark compiled files as generated
  • Implemented Python scripts for:
    • Validating component submissions against schemas
    • Building a compiled catalog from individual component files
    • Enriching components with GitHub, PyPI, and PyPI stats data
    • Computing ranking scores based on configured weights
    • Checking image URLs for stability and accessibility
    • Running the full pipeline in a single command
  • Created a ranking configuration file with weights for stars, recency, contributors, and downloads
  • Added utility modules for HTTP requests, GitHub API interaction, time handling, and JSON operations
  • Added requirements.txt with jsonschema and requests dependencies

How to test?

  1. Install dependencies: pip install -r requirements.txt
  2. Run the full pipeline: python directory/scripts/run_pipeline.py
  3. For GitHub enrichment, set a GitHub token: export GH_TOKEN=your_token
  4. To validate only: python directory/scripts/validate.py
  5. To check images: python directory/scripts/enrich_images.py --check-only
  6. To run enrichment only: python directory/scripts/enrich.py

Why make this change?

This infrastructure enables automated processing of component submissions with:

  • Consistent validation to ensure quality and completeness
  • Enrichment with real-time metrics from GitHub and PyPI
  • Ranking calculation to surface the most relevant components
  • Image validation to prevent broken links
  • A unified pipeline for CI/CD integration

The system is designed to be maintainable, with separate modules for different concerns and configurable parameters for flexibility.

@sfc-gh-bnisco
Copy link
Contributor Author

sfc-gh-bnisco commented Jan 13, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive infrastructure for managing component submissions with automated validation, enrichment, and ranking capabilities.

Changes:

  • Added Python scripts for validating component submissions, building compiled catalogs, enriching with external metrics (GitHub, PyPI, pypistats), and computing ranking scores
  • Implemented utility modules for HTTP requests, GitHub API interaction, time handling, JSON I/O, and enrichment orchestration
  • Added ranking configuration with configurable weights for stars, recency, contributors, and downloads

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
requirements.txt Added jsonschema and requests dependencies
directory/scripts/validate.py Component and compiled catalog validation with policy checks
directory/scripts/run_pipeline.py Orchestrates full pipeline with configurable steps
directory/scripts/enrich_images.py Validates image URLs for stability and fetchability
directory/scripts/enrich.py Coordinates enrichment from multiple services
directory/scripts/compute_ranking.py Computes ranking scores based on configured weights
directory/scripts/build_catalog.py Compiles individual submissions into single catalog
directory/scripts/_utils/*.py Shared utility modules for common operations
directory/scripts/_enrichers/*.py Service-specific enrichers for GitHub, PyPI, and pypistats
directory/ranking_config.json Ranking algorithm configuration
.gitattributes Marks compiled files as generated

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sfc-gh-bnisco sfc-gh-bnisco force-pushed the 01-13-_feat_add_component_directory_scripts branch from 11f9985 to c290ecc Compare January 16, 2026 17:50
@sfc-gh-bnisco sfc-gh-bnisco force-pushed the 01-13-_feat_add_jsonschemas branch from a9e06e1 to 91e2fa0 Compare January 16, 2026 17:50
@sfc-gh-bnisco sfc-gh-bnisco requested a review from Copilot January 16, 2026 17:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sfc-gh-bnisco sfc-gh-bnisco force-pushed the 01-13-_feat_add_jsonschemas branch from 91e2fa0 to a2a0b92 Compare January 16, 2026 18:13
@sfc-gh-bnisco sfc-gh-bnisco force-pushed the 01-13-_feat_add_component_directory_scripts branch from c290ecc to 648f054 Compare January 16, 2026 18:13
@sfc-gh-bnisco sfc-gh-bnisco requested a review from Copilot January 16, 2026 18:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sfc-gh-bnisco sfc-gh-bnisco force-pushed the 01-13-_feat_add_component_directory_scripts branch from 648f054 to afb4c3d Compare January 16, 2026 18:24
@sfc-gh-bnisco sfc-gh-bnisco force-pushed the 01-13-_feat_add_jsonschemas branch from a2a0b92 to 518eaab Compare January 16, 2026 18:24
@sfc-gh-bnisco sfc-gh-bnisco force-pushed the 01-13-_feat_add_component_directory_scripts branch from afb4c3d to a6e0687 Compare January 16, 2026 18:32
Comment on lines +4 to +9
"weights": {
"stars": 1.0,
"recency": 2.0,
"contributors": 0.5,
"downloads": 0.35
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: similar question to a previous PR.. are we just vibe-weighting here? Any reason downloads is much less weighted?

_LINK_LAST_RE = re.compile(r'<([^>]+)>;\s*rel="last"')


def _parse_last_page_from_link_header(link: str | None) -> int | None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: On the whole, I like how you've encapsulated the scraping/parsing of these various sources into this _enrichers pattern. I wish some of it could be less painful like this function, but I think that's just the nature of scraping, and if it works it works. 🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants