codeflash-agent/plugin/references/shared/micro-benchmark.md
2026-04-03 17:36:50 -05:00

2 KiB

Micro-Benchmark — Shared Template

For any optimization, test in isolation first. Call the target function directly — not through the full application — to isolate its true impact.

Role

Micro-benchmarks are a fast pre-screen — validate that an optimization is worth committing before investing in a full codeflash compare run. See e2e-benchmarks.md for how this fits into the two-phase measurement workflow and for fallback behavior when codeflash compare is not available.

A/B Pattern

# /tmp/micro_bench_<name>.py
import sys

def bench_a():
    """Current approach."""
    # ... original code with real input

def bench_b():
    """Optimized approach."""
    # ... optimized code with same input

if __name__ == "__main__":
    {"a": bench_a, "b": bench_b}[sys.argv[1]]()

Running

Domain agents adapt the runner to their measurement tool:

  • Memory: memray run --native --trace-python-allocators -o /tmp/micro_{a,b}.bin /tmp/micro_bench_<name>.py {a,b} then memray stats
  • CPU / Data Structures: wrap with timeit.timeit(fn, number=1000) inside the script
  • Async: wrap with asyncio.run(fn()) and time.perf_counter() for wall-clock
  • Structure: timeit.timeit(bench_import, number=10) with sys.modules cache clearing
$RUNNER /tmp/micro_bench_<name>.py a
$RUNNER /tmp/micro_bench_<name>.py b

Adapt commands for the project's specific setup (virtualenv, PYTHONPATH, working directory, etc.).

Micro-Benchmark-Only Keeps

Keep on micro alone if ALL hold:

  1. Micro-benchmark shows clear, repeatable improvement above the domain threshold
  2. Full tests still pass (always run for correctness)
  3. Change is simple, doesn't add complexity
  4. Function is confirmed on the hot path / exercised by the target test

Domain thresholds:

  • Memory: >10 MiB or >10%
  • CPU / Data Structures: >20% or >2x, hot path confirmed by cProfile
  • Async: >20% or >2x at representative concurrency

These saves compound: as dominant bottlenecks shrink, previously-buried savings surface.