Hi, thanks for maintaining this project.
While running navtest evaluation, I observed a possible inconsistency in the exported CSV/XLSX results. In one case, all visible metric columns are equal to 1.0, but the final score is still reported as 0.0.
Example row:
token,valid,no_at_fault_collisions,drivable_area_compliance,driving_direction_compliance,traffic_light_compliance,ego_progress,time_to_collision_within_bound,lane_keeping,history_comfort,two_frame_extended_comfort,score
fa0f2e54ad7259b0,True,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
From my understanding of the EPDMS / human-penalty filtering mechanism, when the human agent also fails a metric, the corresponding agent metric may be overwritten to 1.0 to avoid false-positive penalties. However, it seems that only the per-metric columns are updated, while some derived fields used for the final score calculation may remain stale.
In particular, in navsim/evaluate/pdm_score.py, the human_penalty_filter block appears to update per-metric columns but skips derived fields such as:
multiplicative_metrics_prod
weighted_metrics
weighted_metrics_array
If the final score is later computed from these derived fields, the exported result can become inconsistent: the visible metrics are all 1.0, but the final score still reflects the pre-filter values.
Could you please confirm whether this is the intended behavior?
If not, would it make sense to recompute the derived fields after applying the human-penalty overrides, for example:
- recompute
multiplicative_metrics_prod from the filtered multiplier metrics;
- recompute
weighted_metrics / weighted_metrics_array from the filtered weighted metrics;
- update the final
pdm_score / score accordingly?
This would make the exported per-metric columns and the final score consistent.
Thanks!
Hi, thanks for maintaining this project.
While running navtest evaluation, I observed a possible inconsistency in the exported CSV/XLSX results. In one case, all visible metric columns are equal to 1.0, but the final score is still reported as 0.0.
Example row:
From my understanding of the EPDMS / human-penalty filtering mechanism, when the human agent also fails a metric, the corresponding agent metric may be overwritten to 1.0 to avoid false-positive penalties. However, it seems that only the per-metric columns are updated, while some derived fields used for the final score calculation may remain stale.
In particular, in
navsim/evaluate/pdm_score.py, thehuman_penalty_filterblock appears to update per-metric columns but skips derived fields such as:If the final score is later computed from these derived fields, the exported result can become inconsistent: the visible metrics are all 1.0, but the final score still reflects the pre-filter values.
Could you please confirm whether this is the intended behavior?
If not, would it make sense to recompute the derived fields after applying the human-penalty overrides, for example:
multiplicative_metrics_prodfrom the filtered multiplier metrics;weighted_metrics/weighted_metrics_arrayfrom the filtered weighted metrics;pdm_score/scoreaccordingly?This would make the exported per-metric columns and the final score consistent.
Thanks!