codeflash-agent/plugin/references/shared/pre-submit-review.md
2026-04-03 17:36:50 -05:00

4.8 KiB

Pre-Submit Self-Review

Before sending [complete], run this checklist against your full diff (git diff <base>..<branch>). Fix any findings before finalizing. This catches the issues that reviewers consistently flag on performance PRs.

1. Resource Ownership

For every del, .close(), .free(), or early-return that drops a reference:

  • Is this object caller-owned? If it was passed as a parameter, the caller may still need it after your function returns. Grep for all call sites of the function and check if any caller uses the object afterward.
  • Is this object shared? If it's accessed via self., a module global, or a cache — other code paths may reference it concurrently.
  • Is this object behind a feature flag or alternate code path? Check for if config.FEATURE_X, if os.environ.get(...), conditional imports, or monkey-patch consumers. Regressions behind feature flags are invisible in standard tests.
# Find all callers of a function you modified
git diff <base>..<branch> --name-only | xargs grep -n "function_name("
# For each caller: does it use the object after the call?

Fix pattern: If you don't own the object, don't close/free it. Use scoped cleanup (context managers, try/finally) only for objects your function creates.

2. Concurrency & Production Safety

Assume this runs in a high-concurrency web service with multiple threads/async tasks.

  • Shared mutable state: Does your change modify module-level variables, class attributes, or globals? Is access thread-safe?
  • Locking scope: If you hold a lock, is the critical section minimal? Are you doing I/O (disk, network) under the lock?
  • asyncio.run() from existing loop: Never call asyncio.run() in code that may already be in an async context. Use asyncio.get_event_loop().run_in_executor() or check for a running loop first.
  • Resource lifecycle under concurrency: File handles, images, arrays, model sessions — what happens if many requests hit this code simultaneously? Is each request getting its own resources, or sharing?
  • Partial failure: If your optimization crashes mid-way, does it leave resources leaked or state corrupted? Check for missing finally/except cleanup.
  • Idempotency: Can your changed function be called multiple times with the same input safely? Does it accumulate state across calls?

3. Correctness vs Intent

Cross-check your implementation against what the PR claims:

  • Every claim in results.tsv has evidence. If you recorded "45% speedup", the benchmark output should show it. If you recorded "KEEP", the tests must have passed.
  • No silent behavior changes. If your optimization changes output format, error handling, logging, or edge case behavior — even slightly — document it. Reviewers will diff the behavior, not just the code.
  • Quality tradeoffs disclosed. If your change trades accuracy for speed (e.g., rule-based vs model-based), latency for memory, or precision for throughput — quantify both sides in your commit message and HANDOFF.md. Don't leave this for the reviewer to discover.

4. Abstraction Boundaries

  • API contracts preserved. If you changed a function's behavior (e.g., it now closes an image it didn't before, or returns a different type), check all callers. The function's implicit contract includes what it does NOT do.
  • No code duplication across paths. If you added a parallel implementation (e.g., async version of a sync function), it will drift. Prefer making the existing function handle both cases over duplicating logic.
  • Don't inline what should stay encapsulated. If you copied logic from a helper to avoid a function call — the next person who fixes a bug in the helper won't know to fix your copy too.

5. Test Coverage of the Actual Path

  • Tests exercise the production code path, not a test-local approximation. If the optimization goes through a monkey-patch, factory, or plugin loader in production, the test must too.
  • Tests cover the alternate paths your change affects. If the function is called from both a sync endpoint and an async endpoint, test both.
  • Regression tests for edge cases mentioned in your analysis (e.g., "empty input", "single element", "concurrent access").

How to Run

At the end of the experiment loop, before sending [complete]:

# 1. Get the full diff
git diff <base-branch>..HEAD

# 2. Get list of all modified functions
git diff <base-branch>..HEAD --name-only

# 3. For each modified file, find all callers
# (focus on functions where you added del/close/free)
grep -rn "function_name(" --include="*.py" .

Walk through each KEEP commit against this checklist. If you find an issue:

  1. Fix it
  2. Re-run tests
  3. Amend or add a new commit
  4. Update results.tsv if metrics changed
  5. Note the fix in HANDOFF.md under "Pre-submit review findings"

Only send [complete] after all checklist items pass.