Micro-Benchmark — Shared Template

For any optimization, test in isolation first. Call the target function directly — not through the full application — to isolate its true impact.

Role

Micro-benchmarks are a fast pre-screen — validate that an optimization is worth committing before investing in a full codeflash compare run. See e2e-benchmarks.md for how this fits into the two-phase measurement workflow and for fallback behavior when codeflash compare is not available.

A/B Pattern

# /tmp/micro_bench_<name>.py
import sys

def bench_a():
    """Current approach."""
    # ... original code with real input

def bench_b():
    """Optimized approach."""
    # ... optimized code with same input

if __name__ == "__main__":
    {"a": bench_a, "b": bench_b}[sys.argv[1]]()

Running

Domain agents adapt the runner to their measurement tool:

Memory: memray run --native --trace-python-allocators -o /tmp/micro_{a,b}.bin /tmp/micro_bench_<name>.py {a,b} then memray stats
CPU / Data Structures: wrap with timeit.timeit(fn, number=1000) inside the script
Async: wrap with asyncio.run(fn()) and time.perf_counter() for wall-clock
Structure: timeit.timeit(bench_import, number=10) with sys.modules cache clearing

$RUNNER /tmp/micro_bench_<name>.py a
$RUNNER /tmp/micro_bench_<name>.py b

Adapt commands for the project's specific setup (virtualenv, PYTHONPATH, working directory, etc.).

Micro-Benchmark-Only Keeps

Keep on micro alone if ALL hold:

Micro-benchmark shows clear, repeatable improvement above the domain threshold
Full tests still pass (always run for correctness)
Change is simple, doesn't add complexity
Function is confirmed on the hot path / exercised by the target test

Domain thresholds:

Memory: >10 MiB or >10%
CPU / Data Structures: >20% or >2x, hot path confirmed by cProfile
Async: >20% or >2x at representative concurrency

These saves compound: as dominant bottlenecks shrink, previously-buried savings surface.

2 KiB Raw Blame History

Micro-Benchmark — Shared Template

Role

A/B Pattern

Running

Micro-Benchmark-Only Keeps

2 KiB

Raw Blame History