Skip to content

Commit a43c257

Browse files
committed
fix: add support for running line by line evaluations
1 parent 730ccc2 commit a43c257

4 files changed

Lines changed: 1693 additions & 4 deletions

File tree

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Line-by-Line Evaluation Sample
2+
3+
This sample demonstrates the line-by-line evaluation feature for output evaluators.
4+
5+
## Overview
6+
7+
Line-by-line evaluation allows evaluators to:
8+
- Split multi-line outputs by a configurable delimiter (e.g., `\n`)
9+
- Evaluate each line independently
10+
- Provide partial credit based on the percentage of correct lines
11+
- Return detailed per-line feedback
12+
13+
## Features Demonstrated
14+
15+
- **Partial Credit Scoring**: Get 0.67 for 2/3 correct lines instead of 0.0
16+
- **Per-Line Feedback**: See exactly which lines passed or failed
17+
- **Configurable Delimiter**: Use `\n`, `|`, or any custom delimiter
18+
- **Comparison**: Side-by-side comparison with regular evaluation
19+
20+
## Installation
21+
22+
This sample uses the UiPath package from TestPyPI:
23+
24+
```bash
25+
# Install dependencies
26+
uv sync
27+
28+
# Or manually install
29+
uv pip install --index-url https://test.pypi.org/simple/ "uipath>=2.10.30.dev1014810000,<2.10.30.dev1014820000"
30+
```
31+
32+
## Usage
33+
34+
### Run the agent
35+
36+
```bash
37+
uv run uipath run main '{"items": ["apple", "banana", "cherry"]}'
38+
```
39+
40+
### Run evaluations
41+
42+
```bash
43+
uv run uipath eval main evaluations/eval-sets/default.json --workers 1
44+
```
45+
46+
## Evaluation Results
47+
48+
The sample includes three test cases:
49+
50+
1. **All lines match exactly** - Both evaluators score 1.0
51+
2. **One line doesn't match** - Line-by-line: 0.67, Regular: 0.0 (shows partial credit!)
52+
3. **Single item** - Both evaluators score 1.0
53+
54+
Expected output:
55+
```
56+
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
57+
┃ Evaluation ┃ LineByLineExactMatch ┃ RegularExactMatch ┃
58+
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
59+
│ Test all lines match │ 1.0 │ 1.0 │
60+
│ Test when one line doesn't │ 0.7 │ 0.0 │ ← Key difference!
61+
│ Test with single item │ 1.0 │ 1.0 │
62+
├───────────────────────────────┼────────────────────────┼─────────────────────┤
63+
│ Average │ 0.9 │ 0.7 │
64+
└───────────────────────────────┴────────────────────────┴─────────────────────┘
65+
```
66+
67+
## Configuration
68+
69+
### Evaluator Configuration
70+
71+
The line-by-line evaluator is configured in `evaluations/evaluators/line-by-line-exact-match.json`:
72+
73+
```json
74+
{
75+
"evaluatorConfig": {
76+
"name": "LineByLineExactMatch",
77+
"targetOutputKey": "result",
78+
"lineByLineEvaluator": true,
79+
"lineDelimiter": "\n"
80+
}
81+
}
82+
```
83+
84+
Key options:
85+
- `lineByLineEvaluator`: Enable line-by-line evaluation (default: `false`)
86+
- `lineDelimiter`: Delimiter to split lines (default: `"\n"`)
87+
88+
### Custom Delimiters
89+
90+
You can use any delimiter:
91+
92+
```json
93+
{
94+
"evaluatorConfig": {
95+
"lineByLineEvaluator": true,
96+
"lineDelimiter": "|" // Pipe-separated values
97+
}
98+
}
99+
```
100+
101+
## File Structure
102+
103+
```
104+
line_by_line_test/
105+
├── main.py # Simple agent that outputs one item per line
106+
├── uipath.json # Agent configuration
107+
├── pyproject.toml # Dependencies (uses TestPyPI)
108+
└── evaluations/
109+
├── evaluators/
110+
│ ├── line-by-line-exact-match.json # Line-by-line evaluator
111+
│ └── regular-exact-match.json # Regular evaluator (for comparison)
112+
└── eval-sets/
113+
└── default.json # Test cases
114+
```
115+
116+
## Learn More
117+
118+
- [UiPath Python SDK Documentation](https://docs.uipath.com/)
119+
- [Evaluation Framework Guide](../../src/uipath/_resources/eval.md)
Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,38 @@
11
{
2-
"$schema": "https://cloud.uipath.com/draft/2024-12/entry-point",
3-
"$id": "entry-points.json",
4-
"entryPoints": []
5-
}
2+
"$schema": "https://cloud.uipath.com/draft/2024-12/entry-point",
3+
"$id": "entry-points.json",
4+
"entryPoints": [
5+
{
6+
"filePath": "main",
7+
"uniqueId": "main",
8+
"type": "function",
9+
"input": {
10+
"type": "object",
11+
"properties": {
12+
"items": {
13+
"type": "array",
14+
"items": {
15+
"type": "string"
16+
}
17+
}
18+
},
19+
"description": "Input schema.",
20+
"required": [
21+
"items"
22+
]
23+
},
24+
"output": {
25+
"type": "object",
26+
"properties": {
27+
"result": {
28+
"type": "string"
29+
}
30+
},
31+
"description": "Output schema.",
32+
"required": [
33+
"result"
34+
]
35+
}
36+
}
37+
]
38+
}
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
[project]
2+
name = "line-by-line-test"
3+
version = "0.1.0"
4+
description = "Sample agent demonstrating line-by-line evaluation"
5+
authors = [{ name = "John Doe", email = "john.doe@myemail.com" }]
6+
requires-python = ">=3.11"
7+
dependencies = [
8+
# Use the version from your PR on TestPyPI
9+
"uipath>=2.10.30.dev1014810000,<2.10.30.dev1014820000"
10+
]
11+
12+
[[tool.uv.index]]
13+
name = "testpypi"
14+
url = "https://test.pypi.org/simple/"
15+
publish-url = "https://test.pypi.org/legacy/"
16+
explicit = true
17+
18+
[tool.uv.sources]
19+
uipath = { index = "testpypi" }

0 commit comments

Comments
 (0)