Skip to content

Possible inconsistency: final score remains 0 although all exported metric columns are 1 after human_penalty_filter #14

@Frank-LuHao

Description

@Frank-LuHao

Hi, thanks for maintaining this project.

While running navtest evaluation, I observed a possible inconsistency in the exported CSV/XLSX results. In one case, all visible metric columns are equal to 1.0, but the final score is still reported as 0.0.

Example row:

token,valid,no_at_fault_collisions,drivable_area_compliance,driving_direction_compliance,traffic_light_compliance,ego_progress,time_to_collision_within_bound,lane_keeping,history_comfort,two_frame_extended_comfort,score
fa0f2e54ad7259b0,True,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0

From my understanding of the EPDMS / human-penalty filtering mechanism, when the human agent also fails a metric, the corresponding agent metric may be overwritten to 1.0 to avoid false-positive penalties. However, it seems that only the per-metric columns are updated, while some derived fields used for the final score calculation may remain stale.

In particular, in navsim/evaluate/pdm_score.py, the human_penalty_filter block appears to update per-metric columns but skips derived fields such as:

multiplicative_metrics_prod
weighted_metrics
weighted_metrics_array

If the final score is later computed from these derived fields, the exported result can become inconsistent: the visible metrics are all 1.0, but the final score still reflects the pre-filter values.

Could you please confirm whether this is the intended behavior?

If not, would it make sense to recompute the derived fields after applying the human-penalty overrides, for example:

  1. recompute multiplicative_metrics_prod from the filtered multiplier metrics;
  2. recompute weighted_metrics / weighted_metrics_array from the filtered weighted metrics;
  3. update the final pdm_score / score accordingly?

This would make the exported per-metric columns and the final score consistent.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions