|
| 1 | +# Fromager Benchmarks |
| 2 | + |
| 3 | +Performance benchmarks for Fromager, a tool for rebuilding complete dependency trees of Python wheels from source. |
| 4 | + |
| 5 | +## Why Benchmarks? |
| 6 | + |
| 7 | +Fromager recursively resolves and builds entire dependency trees from source. A typical bootstrap processes hundreds of packages, each requiring version resolution, source acquisition, patching, and wheel building. Performance regressions in Fromager's core logic compound across these operations. |
| 8 | + |
| 9 | +**The challenge:** Wall-clock benchmarks in shared CI environments vary 10-20% due to noise. A genuine 5% regression becomes indistinguishable from CI variance. These benchmarks focus on pure Python operations where measurements are stable and regressions are detectable. |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## Quick Start |
| 14 | + |
| 15 | +```bash |
| 16 | +# Install dependencies |
| 17 | +uv sync --extra benchmark |
| 18 | + |
| 19 | +# Run all benchmarks |
| 20 | +uv run pytest benchmarks/ |
| 21 | + |
| 22 | +# Fast benchmarks only (skip slow and integration) |
| 23 | +uv run pytest benchmarks/ -m "not slow and not integration" |
| 24 | + |
| 25 | +# Integration benchmarks only |
| 26 | +uv run pytest benchmarks/ -m "integration" |
| 27 | + |
| 28 | +# With memory profiling |
| 29 | +uv run pytest benchmarks/ --memray |
| 30 | + |
| 31 | +# Compare against baseline |
| 32 | +uv run pytest benchmarks/ --benchmark-save=baseline |
| 33 | +# ... make changes ... |
| 34 | +uv run pytest benchmarks/ --benchmark-compare=baseline |
| 35 | + |
| 36 | +# Export to JSON |
| 37 | +uv run pytest benchmarks/ --benchmark-json=results.json |
| 38 | +``` |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +## Understanding Output |
| 43 | + |
| 44 | +``` |
| 45 | +-------------------------------- benchmark: 3 tests -------------------------------- |
| 46 | +Name Mean StdDev Rounds |
| 47 | +------------------------------------------------------------------------------------ |
| 48 | +test_constraint_add_and_check 0.85ms 0.05ms 100 |
| 49 | +test_graph_serialization 1.20ms 0.08ms 100 |
| 50 | +test_python_version_matching_hot 0.12ms 0.01ms 200 |
| 51 | +------------------------------------------------------------------------------------ |
| 52 | +``` |
| 53 | + |
| 54 | +**Key metrics:** |
| 55 | +- **Mean** — Primary comparison metric |
| 56 | +- **StdDev** — Low values indicate reliable measurements |
| 57 | +- **Rounds** — More rounds = more statistical confidence |
| 58 | + |
| 59 | +**Comparison output:** |
| 60 | +``` |
| 61 | +Name Mean (now) Mean (base) Ratio |
| 62 | +-------------------------------------------------------------------- |
| 63 | +test_constraint_add_and_check 0.87ms 0.85ms 1.02x |
| 64 | +``` |
| 65 | + |
| 66 | +- **Ratio < 1.0** — Faster (improvement) |
| 67 | +- **Ratio > 1.15** — Investigate before merging |
| 68 | + |
| 69 | +--- |
| 70 | + |
| 71 | +## Adding Benchmarks |
| 72 | + |
| 73 | +Create test functions in `test_*.py` files using the `benchmark` fixture: |
| 74 | + |
| 75 | +```python |
| 76 | +def test_constraint_satisfaction(benchmark): |
| 77 | + """Benchmark Fromager's constraint checking.""" |
| 78 | + from fromager.constraints import Constraints |
| 79 | + from packaging.version import Version |
| 80 | + |
| 81 | + constraints = Constraints() |
| 82 | + constraints.add_constraint("numpy>=1.20,<2.0") |
| 83 | + |
| 84 | + versions = [Version(v) for v in ["1.19.0", "1.25.0", "2.0.0"]] |
| 85 | + |
| 86 | + def check_all(): |
| 87 | + return [constraints.is_satisfied_by("numpy", v) for v in versions] |
| 88 | + |
| 89 | + result = benchmark(check_all) |
| 90 | + assert result == [False, True, False] |
| 91 | +``` |
| 92 | + |
| 93 | +**Guidelines:** |
| 94 | +- Keep setup outside the benchmark function |
| 95 | +- Assert correctness to ensure the benchmark actually works |
| 96 | +- Mark slow benchmarks with `@pytest.mark.slow` |
| 97 | +- Add metadata with `benchmark.extra_info["key"] = value` |
| 98 | + |
| 99 | +--- |
| 100 | + |
| 101 | +## Advanced Features |
| 102 | + |
| 103 | +### Benchmark Categories |
| 104 | + |
| 105 | +| Category | File | Characteristics | |
| 106 | +|----------|------|-----------------| |
| 107 | +| Component | `test_resolution.py` | Fast, pure Python, no subprocess | |
| 108 | +| Integration | `test_integration.py` | Slow, uses fixtures, network-isolated | |
| 109 | +| Memory | Any with `--memray` | Tracks allocations and peak memory | |
| 110 | + |
| 111 | +### Markers |
| 112 | + |
| 113 | +- `@pytest.mark.slow` — Skip with `-m "not slow"` |
| 114 | +- `@pytest.mark.integration` — Requires fixtures (local PyPI, uv shim) |
| 115 | +- `@pytest.mark.memory` — Memory-focused benchmarks |
| 116 | + |
| 117 | +### Integration Fixtures |
| 118 | + |
| 119 | +The `fixtures/` module provides isolation for realistic benchmarks: |
| 120 | + |
| 121 | +- **`local_pypi`** — Session-scoped local PyPI server for network isolation |
| 122 | +- **`configured_env`** — Configures environment to use local PyPI |
| 123 | +- **`uv_shim`** — Creates mock uv binary for subprocess isolation |
| 124 | +- **`subprocess_timer`** — Measures subprocess execution time and overhead |
| 125 | + |
| 126 | +### Memory Profiling |
| 127 | + |
| 128 | +Memory benchmarks use pytest-memray (non-Windows): |
| 129 | + |
| 130 | +```bash |
| 131 | +uv run pytest benchmarks/ --memray |
| 132 | +uv run pytest benchmarks/ --memray --memray-bin-path=./memray-results |
| 133 | +``` |
| 134 | + |
| 135 | +### CI Integration |
| 136 | + |
| 137 | +Benchmarks run automatically via GitHub Actions: |
| 138 | + |
| 139 | +- **`benchmarks.yml`** — PRs with `run-benchmarks` label or push to main |
| 140 | +- **`benchmarks-nightly.yml`** — Nightly integration benchmarks (2 AM UTC) |
| 141 | + |
| 142 | +CodSpeed provides noise-resistant CI benchmarks via instruction counting rather than wall-clock time. |
| 143 | + |
| 144 | +--- |
| 145 | + |
| 146 | +## Directory Structure |
| 147 | + |
| 148 | +``` |
| 149 | +benchmarks/ |
| 150 | +├── README.md # This file |
| 151 | +├── conftest.py # Shared fixtures and markers |
| 152 | +├── pytest.ini # Benchmark configuration |
| 153 | +├── fixtures/ # Reusable fixture modules |
| 154 | +│ ├── __init__.py |
| 155 | +│ ├── pypi_server.py # Local PyPI server |
| 156 | +│ ├── uv_shim.py # Subprocess isolation |
| 157 | +│ └── metrics.py # Timing collectors |
| 158 | +├── requirements/ # Package requirements for local PyPI |
| 159 | +│ └── packages.txt |
| 160 | +├── test_resolution.py # Component benchmarks |
| 161 | +└── test_integration.py # Integration benchmarks (slow) |
| 162 | +``` |
| 163 | + |
| 164 | +--- |
| 165 | + |
| 166 | +## Troubleshooting |
| 167 | + |
| 168 | +**High variance:** Close resource-intensive applications. Increase rounds: |
| 169 | +```bash |
| 170 | +uv run pytest benchmarks/ --benchmark-min-rounds=20 |
| 171 | +``` |
| 172 | + |
| 173 | +**Missing module:** Install dependencies with `uv sync --extra benchmark` |
| 174 | + |
| 175 | +**Debug without timing:** Run benchmarks as regular tests: |
| 176 | +```bash |
| 177 | +uv run pytest benchmarks/ --benchmark-disable |
| 178 | +``` |
| 179 | + |
| 180 | +--- |
| 181 | + |
| 182 | +## Resources |
| 183 | + |
| 184 | +- [pytest-benchmark documentation](https://pytest-benchmark.readthedocs.io/) |
| 185 | +- [CodSpeed documentation](https://docs.codspeed.io/) |
| 186 | +- [pytest-memray documentation](https://pytest-memray.readthedocs.io/) |
0 commit comments