Possible inconsistency: final score remains 0 although all exported metric columns are 1 after human_penalty_filter

Hi, thanks for maintaining this project.

While running navtest evaluation, I observed a possible inconsistency in the exported CSV/XLSX results. In one case, all visible metric columns are equal to 1.0, but the final score is still reported as 0.0.

Example row:

```csv
token,valid,no_at_fault_collisions,drivable_area_compliance,driving_direction_compliance,traffic_light_compliance,ego_progress,time_to_collision_within_bound,lane_keeping,history_comfort,two_frame_extended_comfort,score
fa0f2e54ad7259b0,True,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
```

From my understanding of the EPDMS / human-penalty filtering mechanism, when the human agent also fails a metric, the corresponding agent metric may be overwritten to 1.0 to avoid false-positive penalties. However, it seems that only the per-metric columns are updated, while some derived fields used for the final score calculation may remain stale.

In particular, in `navsim/evaluate/pdm_score.py`, the `human_penalty_filter` block appears to update per-metric columns but skips derived fields such as:

```python
multiplicative_metrics_prod
weighted_metrics
weighted_metrics_array
```

If the final score is later computed from these derived fields, the exported result can become inconsistent: the visible metrics are all 1.0, but the final score still reflects the pre-filter values.

Could you please confirm whether this is the intended behavior?

If not, would it make sense to recompute the derived fields after applying the human-penalty overrides, for example:

1. recompute `multiplicative_metrics_prod` from the filtered multiplier metrics;
2. recompute `weighted_metrics` / `weighted_metrics_array` from the filtered weighted metrics;
3. update the final `pdm_score` / `score` accordingly?

This would make the exported per-metric columns and the final score consistent.

Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible inconsistency: final score remains 0 although all exported metric columns are 1 after human_penalty_filter #14

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Possible inconsistency: final score remains 0 although all exported metric columns are 1 after human_penalty_filter #14

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions