2 KiB
2 KiB
Micro-Benchmark — Shared Template
For any optimization, test in isolation first. Call the target function directly — not through the full application — to isolate its true impact.
Role
Micro-benchmarks are a fast pre-screen — validate that an optimization is worth committing before investing in a full codeflash compare run. See e2e-benchmarks.md for how this fits into the two-phase measurement workflow and for fallback behavior when codeflash compare is not available.
A/B Pattern
# /tmp/micro_bench_<name>.py
import sys
def bench_a():
"""Current approach."""
# ... original code with real input
def bench_b():
"""Optimized approach."""
# ... optimized code with same input
if __name__ == "__main__":
{"a": bench_a, "b": bench_b}[sys.argv[1]]()
Running
Domain agents adapt the runner to their measurement tool:
- Memory:
memray run --native --trace-python-allocators -o /tmp/micro_{a,b}.bin /tmp/micro_bench_<name>.py {a,b}thenmemray stats - CPU / Data Structures: wrap with
timeit.timeit(fn, number=1000)inside the script - Async: wrap with
asyncio.run(fn())andtime.perf_counter()for wall-clock - Structure:
timeit.timeit(bench_import, number=10)withsys.modulescache clearing
$RUNNER /tmp/micro_bench_<name>.py a
$RUNNER /tmp/micro_bench_<name>.py b
Adapt commands for the project's specific setup (virtualenv, PYTHONPATH, working directory, etc.).
Micro-Benchmark-Only Keeps
Keep on micro alone if ALL hold:
- Micro-benchmark shows clear, repeatable improvement above the domain threshold
- Full tests still pass (always run for correctness)
- Change is simple, doesn't add complexity
- Function is confirmed on the hot path / exercised by the target test
Domain thresholds:
- Memory: >10 MiB or >10%
- CPU / Data Structures: >20% or >2x, hot path confirmed by cProfile
- Async: >20% or >2x at representative concurrency
These saves compound: as dominant bottlenecks shrink, previously-buried savings surface.