2.7 KiB
2.7 KiB
End-to-End Benchmarks — Python
Python-specific E2E benchmark tooling. Read ${CLAUDE_PLUGIN_ROOT}/references/shared/e2e-benchmarks.md first for the language-agnostic framework.
Detection: codeflash compare
Check at session start:
# Is codeflash installed?
$RUNNER -c "import codeflash" 2>/dev/null && echo "codeflash available" || echo "not available"
# Is benchmarks-root configured?
grep -A5 '\[tool\.codeflash\]' pyproject.toml | grep benchmarks.root
If both checks pass, codeflash compare is available. Record in .codeflash/setup.md:
## E2E Benchmarks
codeflash compare: available
benchmarks-root: <path>
If either check fails, record:
## E2E Benchmarks
codeflash compare: not available (reason: <no benchmarks-root | codeflash not installed>)
fallback: ad-hoc micro-benchmarks + pytest durations
How codeflash compare Works
codeflash compare <base_ref> <head_ref>:
- Auto-detects changed functions from
git diff(line-level overlap, not just file-level) - Creates isolated git worktrees for each ref
- Instruments target functions with
@codeflash_trace - Runs benchmarks via
trace_benchmarks_pytest - Produces per-function nanosecond timings and a side-by-side comparison table
Usage
After every KEEP commit
# Compare the commit before your optimization with HEAD
$RUNNER -m codeflash compare <pre-optimization-sha> HEAD --timeout 120
# Or to measure cumulative improvement since the session baseline
$RUNNER -m codeflash compare <baseline-sha> HEAD --timeout 120
Explicit function targeting
When auto-detection misses functions (e.g., methods inside classes are excluded by default), use --functions:
$RUNNER -m codeflash compare <base> HEAD --functions "src/module.py::func1,func2;src/other.py::func3"
At milestones
# Cumulative e2e measurement
$RUNNER -m codeflash compare <baseline-sha> HEAD --timeout 120
Fallback: When codeflash compare Is Not Available
- Use ad-hoc micro-benchmarks as the primary measurement (see
micro-benchmark.md) - Use
pytest --durationsfor test suite wall-clock as a secondary signal - Use
cProfilecumtime comparisons for project-function-level attribution
Known Limitations
- Only top-level functions are auto-detected and instrumented. Class methods are excluded because
@codeflash_tracepicklesselfon every call, which is catastrophic whenselfholds large objects. Use--functionsto explicitly target methods when needed. - Requires committed code: Works on git refs, so changes must be committed before benchmarking.
- Benchmark files must exist in
benchmarks-root. If the project has no benchmarks yet, fall back to ad-hoc measurement.