Skip to content

Conversation

@rlratzel
Copy link
Contributor

Adds image curation benchmark to nightly run. This uses the image curation "getting started" tutorial.

rlratzel and others added 14 commits December 12, 2025 21:46
…images with :latest by default, adds session name to slack report.

Signed-off-by: rlratzel <[email protected]>
…g script to allow for more flexibility.

Signed-off-by: rlratzel <[email protected]>
…n-readable output is needed, updates paths to benchmark output dir.

Signed-off-by: rlratzel <[email protected]>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 21, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rlratzel rlratzel marked this pull request as ready for review January 7, 2026 17:44
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 7, 2026

Greptile Summary

This PR adds an image curation benchmark to the nightly benchmark suite, along with refactoring the placeholder substitution logic in the benchmark runner to support the new {curator_repo_dir} placeholder.

Key Changes:

  • Added image_curation benchmark entry that runs the image curation tutorial script
  • Added mscoco and mscoco_model_weights dataset definitions
  • Refactored Entry.substitute_paths_in_cmd() into two separate methods: substitute_reserved_placeholders() (for {curator_repo_dir}, {session_entry_dir}, {dataset:...}) and substitute_container_or_host_paths() (for PathResolver paths)
  • Added support for {curator_repo_dir} placeholder to reference scripts outside the benchmarking/scripts directory
  • Simplified get_obj_for_json() by removing unused conversion cases

Critical Issue:

  • The image_curation benchmark entry is missing the ray: configuration block to allocate GPUs. The image curation pipeline requires GPUs for 4 stages (each using 0.25 GPUs per worker), but without this config, the benchmark will default to 0 GPUs (benchmarking/run.py:161) and fail or run incorrectly.

Confidence Score: 2/5

  • This PR has a critical configuration issue that will cause the benchmark to fail
  • The refactoring in benchmarking/runner/entry.py is solid and the code changes are clean. However, the missing GPU configuration in the new image_curation benchmark entry is a critical issue that will cause the benchmark to fail or run with 0 GPUs when it requires GPUs for multiple pipeline stages. This must be fixed before merging.
  • benchmarking/nightly-benchmark.yaml requires GPU configuration for the image_curation entry

Important Files Changed

Filename Overview
benchmarking/nightly-benchmark.yaml adds image_curation benchmark entry and mscoco datasets; missing ray GPU configuration will cause the GPU-dependent pipeline to fail or run with 0 GPUs
benchmarking/runner/entry.py refactors placeholder substitution logic by splitting into separate methods for reserved placeholders and path resolution, adds support for {curator_repo_dir} placeholder

Sequence Diagram

sequenceDiagram
    participant Runner as Benchmark Runner
    participant Session as Session
    participant Entry as Entry
    participant PathRes as PathResolver
    participant DataRes as DatasetResolver
    participant Ray as Ray Cluster
    participant Script as Image Curation Script

    Runner->>Runner: Load YAML config
    Runner->>Session: create_from_dict(config)
    Session->>PathRes: Create PathResolver
    Session->>DataRes: Create DatasetResolver
    Session->>Session: Create Entry objects
    
    Runner->>Entry: get_command_to_run()
    Entry->>Entry: substitute_reserved_placeholders()<br/>{curator_repo_dir}, {session_entry_dir}, {dataset:...}
    Entry->>PathRes: substitute_container_or_host_paths()<br/>resolve paths for container/host mapping
    Entry-->>Runner: Return resolved command
    
    Runner->>Ray: setup_ray_cluster_and_env()<br/>with num_gpus from entry.ray config
    Note over Ray: Defaults to 0 GPUs if not specified
    
    Runner->>Script: Execute python command
    Script->>Script: create_image_curation_pipeline()
    Note over Script: Pipeline stages use num_gpus_per_worker<br/>ImageReaderStage: 0.25<br/>ImageEmbeddingStage: 0.25<br/>ImageAestheticFilterStage: 0.25<br/>ImageNSFWFilterStage: 0.25
    Script-->>Runner: Return exit code
    
    Runner->>Runner: get_entry_script_persisted_data()<br/>Read metrics.json, params.json, tasks.pkl
    Runner->>Runner: check_requirements_update_results()
    Runner->>Ray: teardown_ray_cluster_and_env()
    Runner->>Runner: Write results.json
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +182 to +197
- name: image_curation
enabled: true
script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py"
args: >-
--input-wds-dataset-dir {dataset:mscoco,wds}
--output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco
--model-dir {dataset:mscoco_model_weights,files}
--batch-size 100
--embedding-batch-size 100
--aesthetic-batch-size 100
--nsfw-batch-size 100
--tar-files-per-partition 10
--aesthetic-threshold 0.9
--nsfw-threshold 0.9
--skip-download
--verbose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing ray: configuration block to allocate GPUs

the script requires GPUs for multiple stages (ImageReaderStage uses 0.25 GPUs, ImageEmbeddingStage/ImageAestheticFilterStage/ImageNSFWFilterStage each default to 0.25 GPUs per worker)

other GPU benchmarks like domain_classification_raydata (lines 75-78) include:

ray:
  num_cpus: 64
  num_gpus: 4
  enable_object_spilling: false

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +182 to +197
- name: image_curation
enabled: true
script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py"
args: >-
--input-wds-dataset-dir {dataset:mscoco,wds}
--output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco
--model-dir {dataset:mscoco_model_weights,files}
--batch-size 100
--embedding-batch-size 100
--aesthetic-batch-size 100
--nsfw-batch-size 100
--tar-files-per-partition 10
--aesthetic-threshold 0.9
--nsfw-threshold 0.9
--skip-download
--verbose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: missing ray: configuration block for GPU allocation

the image curation script requires GPUs for multiple stages (ImageReaderStage, ImageEmbeddingStage, ImageAestheticFilterStage, ImageNSFWFilterStage each use 0.25 GPUs per worker by default)

add configuration like other GPU benchmarks:

Suggested change
- name: image_curation
enabled: true
script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py"
args: >-
--input-wds-dataset-dir {dataset:mscoco,wds}
--output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco
--model-dir {dataset:mscoco_model_weights,files}
--batch-size 100
--embedding-batch-size 100
--aesthetic-batch-size 100
--nsfw-batch-size 100
--tar-files-per-partition 10
--aesthetic-threshold 0.9
--nsfw-threshold 0.9
--skip-download
--verbose
- name: image_curation
enabled: true
script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py"
args: >-
--input-wds-dataset-dir {dataset:mscoco,wds}
--output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco
--model-dir {dataset:mscoco_model_weights,files}
--batch-size 100
--embedding-batch-size 100
--aesthetic-batch-size 100
--nsfw-batch-size 100
--tar-files-per-partition 10
--aesthetic-threshold 0.9
--nsfw-threshold 0.9
--skip-download
--verbose
ray:
num_cpus: 64
num_gpus: 4
enable_object_spilling: false

Signed-off-by: rlratzel <[email protected]>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +182 to +197
- name: image_curation
enabled: true
script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py"
args: >-
--input-wds-dataset-dir {dataset:mscoco,wds}
--output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco
--model-dir {dataset:mscoco_model_weights,files}
--batch-size 100
--embedding-batch-size 100
--aesthetic-batch-size 100
--nsfw-batch-size 100
--tar-files-per-partition 10
--aesthetic-threshold 0.9
--nsfw-threshold 0.9
--skip-download
--verbose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: missing ray: configuration block for GPU allocation

the image curation pipeline requires GPUs (4 stages use 0.25 GPUs per worker by default: ImageReaderStage, ImageEmbeddingStage, ImageAestheticFilterStage, ImageNSFWFilterStage)

without this config, the benchmark will use 0 GPUs (default from benchmarking/run.py:161) and likely fail or run very slowly

add GPU config like other benchmarks (e.g., lines 75-78):

Suggested change
- name: image_curation
enabled: true
script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py"
args: >-
--input-wds-dataset-dir {dataset:mscoco,wds}
--output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco
--model-dir {dataset:mscoco_model_weights,files}
--batch-size 100
--embedding-batch-size 100
--aesthetic-batch-size 100
--nsfw-batch-size 100
--tar-files-per-partition 10
--aesthetic-threshold 0.9
--nsfw-threshold 0.9
--skip-download
--verbose
- name: image_curation
enabled: true
script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py"
args: >-
--input-wds-dataset-dir {dataset:mscoco,wds}
--output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco
--model-dir {dataset:mscoco_model_weights,files}
--batch-size 100
--embedding-batch-size 100
--aesthetic-batch-size 100
--nsfw-batch-size 100
--tar-files-per-partition 10
--aesthetic-threshold 0.9
--nsfw-threshold 0.9
--skip-download
--verbose
ray:
num_cpus: 64
num_gpus: 4
enable_object_spilling: false

@praateekmahajan praateekmahajan enabled auto-merge (squash) January 14, 2026 00:25
@praateekmahajan praateekmahajan merged commit d9ade75 into NVIDIA-NeMo:main Jan 14, 2026
18 checks passed
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +182 to +197
- name: image_curation
enabled: true
script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py"
args: >-
--input-wds-dataset-dir {dataset:mscoco,wds}
--output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco
--model-dir {dataset:mscoco_model_weights,files}
--batch-size 100
--embedding-batch-size 100
--aesthetic-batch-size 100
--nsfw-batch-size 100
--tar-files-per-partition 10
--aesthetic-threshold 0.9
--nsfw-threshold 0.9
--skip-download
--verbose
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: missing ray: configuration to allocate GPUs

the image curation pipeline uses GPUs in 4 stages: ImageReaderStage (0.25), ImageEmbeddingStage (0.25), ImageAestheticFilterStage (0.25), and ImageNSFWFilterStage (0.25) - see tutorials/image/getting-started/image_curation_example.py:50,56,65,74

without this config, benchmarking/run.py:161 defaults to 0 GPUs, causing the pipeline to fail or run incorrectly

add GPU allocation like other GPU benchmarks (e.g., lines 75-78):

Suggested change
- name: image_curation
enabled: true
script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py"
args: >-
--input-wds-dataset-dir {dataset:mscoco,wds}
--output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco
--model-dir {dataset:mscoco_model_weights,files}
--batch-size 100
--embedding-batch-size 100
--aesthetic-batch-size 100
--nsfw-batch-size 100
--tar-files-per-partition 10
--aesthetic-threshold 0.9
--nsfw-threshold 0.9
--skip-download
--verbose
- name: image_curation
enabled: true
script: "{curator_repo_dir}/tutorials/image/getting-started/image_curation_example.py"
args: >-
--input-wds-dataset-dir {dataset:mscoco,wds}
--output-dataset-dir {session_entry_dir}/results_truncated_100K_mscoco
--model-dir {dataset:mscoco_model_weights,files}
--batch-size 100
--embedding-batch-size 100
--aesthetic-batch-size 100
--nsfw-batch-size 100
--tar-files-per-partition 10
--aesthetic-threshold 0.9
--nsfw-threshold 0.9
--skip-download
--verbose
ray:
num_cpus: 64
num_gpus: 4
enable_object_spilling: false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants