-
Notifications
You must be signed in to change notification settings - Fork 422
Description
Many evaluation metrics do not require extra context provided by prompts. In fact, the prompt may confuse the metric. For example, a non_advice metric might start checking response on instruction-following confusing instructions for extensions to non_advice criteria.
My use case for evaluating responses without relying on a prompt is currently not supported.
Hardcoding an LLMMetric with only a {response} field, and
- supplying an eval_dataset with only a response column raises EvalDatasetSchema.UNKNOWN
- while supplying an eval_dataset with prompt and response columns raises INVALID_ARGUMENT
The current workarounds (supplying dummy prompt or extra 'ignore prompt' instructions to the judge) are cumbersome.
I propose adding infrastructure to support the bare minimum 'response' input, in the form of a more flexible evaluate -- so addressing the INVALID_ARGUMENT error.
This would be a welcome addition to the existing infrastructure supporting extra inputs.