codeflash-agent/plugin/references/shared/agent-base-protocol.md
Kevin Turcios cee3987d7b cleanup
2026-04-06 05:58:13 -05:00

9.4 KiB

Agent Base Protocol

Shared operational rules for all Codeflash domain optimization agents (CPU, async, memory, structure). Each agent reads this file at session start. Domain-specific overrides live in the agent prompt itself.

Context Management

Use Explore subagents for ALL codebase investigation — reading unfamiliar code, searching for patterns, understanding architecture. Only read code directly when you are about to edit it. Do NOT run more than 2 background tasks simultaneously — over-parallelization leads to timeouts, killed tasks, and lost track of what's running. Sequential focused work produces better results than scattered parallel work.

Experiment Discipline

  • Always profile before fixing. This is mandatory — never skip. Your first action after setup must be running an actual profiler to get quantified, per-function evidence. Reading source code and guessing at bottlenecks is not profiling. Running tests and looking at wall-clock time is not profiling.
  • One fix per experiment. NEVER batch multiple fixes into one edit. Each iteration targets exactly one function/allocation/pattern. This discipline is essential — you cannot rank, skip, or reprofile if you change everything at once.
  • LOCK your measurement methodology at baseline time. Do NOT change profiling flags, test filters, benchmark parameters, or tool settings mid-experiment. Changing methodology creates uninterpretable results. If you need different parameters, record a new baseline first and note the methodology change in HANDOFF.md.

Commit Rules

After each KEEP, stage ONLY the files you changed: git add <specific files> && git commit -m "<domain-prefix>: <one-line summary>". Do NOT use git add -A or git add . — these stage scratch files, benchmarks, and user work. Each optimization gets its own commit so they can be reverted or cherry-picked independently. Do NOT commit discards. If the project has pre-commit hooks (check for .pre-commit-config.yaml), run pre-commit run --all-files before committing — CI failures from forgotten linting waste time.

Domain commit prefixes: perf: (CPU), async: (async), mem: (memory), struct: (structure), perf: (deep/cross-domain).

Stuck State Recovery

If 5+ consecutive discards (across all strategy rotations), trigger this recovery protocol before giving up:

  1. Re-read all in-scope files from scratch. Your mental model may have drifted — re-read the actual code, not your cached understanding.
  2. Re-read the full results log (.codeflash/results.tsv). Look for patterns: which files/functions appeared in successful experiments (focus there), which techniques worked (try variants on new targets), which approaches failed repeatedly (avoid them).
  3. Re-read the original goal. Has the focus drifted from what the user asked for?
  4. Try combining 2-3 previously successful changes that might compound.
  5. Try the opposite of what hasn't worked. If fine-grained optimizations keep failing, try a coarser architectural change. If local changes keep failing, try a cross-function refactor.
  6. Check git history for hints: git log --oneline -20 --stat — do successful commits cluster in specific files or patterns?

If recovery still produces no improvement after 3 more experiments, stop and report with a summary of what was tried and why the codebase appears to be at its optimization floor for this domain.

Key Files

All session state lives in .codeflash/:

  • .codeflash/results.tsv — Experiment log. Read at startup, append after each experiment.
  • .codeflash/HANDOFF.md — Session state. Read at startup, update after each keep/discard.
  • .codeflash/conventions.md — Maintainer preferences. Read at startup. Update when changes rejected.
  • .codeflash/setup.md — Runner, Python version, test commands, available tools. Written by setup agent.

Session Resume

  1. Read .codeflash/HANDOFF.md, .codeflash/results.tsv, .codeflash/conventions.md.
  2. Confirm with user what to work on next.
  3. Continue the experiment loop.

Session Start — Common Steps

  1. Read setup. Read .codeflash/setup.md for the runner, Python version, and test command. Read .codeflash/conventions.md if it exists. Also check for org-level conventions at ../conventions.md (project-level overrides org-level). Read .codeflash/learnings.md if it exists — these are discoveries from previous sessions that prevent repeating dead ends. Read CLAUDE.md. Use the runner from setup.md everywhere you see $RUNNER.
  2. Create or switch to optimization branch. git checkout -b codeflash/optimize (or git checkout codeflash/optimize if it already exists). All optimizations stack as commits on this single branch.
  3. Initialize HANDOFF.md with environment and discovery.

Domain agents add domain-specific steps after these common steps (e.g., baseline profiling method, benchmark tier definition).

Constraints (shared)

  • Correctness: All previously-passing tests must still pass.
  • Simplicity: Simpler is better. Don't add complexity for marginal gains.
  • Style: Match existing project conventions. Don't introduce patterns maintainers will reject.

Domain agents add additional domain-specific constraints (e.g., performance measurement required for CPU, no new dependencies for memory).

Research Tools

context7: mcp__context7__resolve-library-id then mcp__context7__query-docs for library docs. Use aggressively for API signatures — APIs change across versions.

WebFetch: For specific URLs when context7 doesn't cover a topic.

Explore subagents: For codebase investigation to keep your context clean.

Progress Reporting Protocol

When running as a named teammate, send progress messages to the team lead at these milestones. If SendMessage is unavailable (not in a team), skip this — the file-based logging is always the source of truth.

Standard message points (domain-specific content in each agent's prompt):

  1. After baseline profiling: Summary of profiling results
  2. After each experiment: Target, result (KEEP/DISCARD), metrics
  3. Every 3 experiments: Periodic progress summary for user relay
  4. At milestones (every 3-5 keeps): Cumulative improvement
  5. At plateau/completion: Final summary
  6. When stuck (5+ consecutive discards): What's been tried
  7. Cross-domain discovery: Signal to router — do NOT fix cross-domain issues yourself
  8. File modification notification: After each KEEP commit, notify researcher per modified file: SendMessage(to: "researcher", summary: "File modified", message: "[modified <file-path>]"). This prevents the researcher from sending outdated analysis for code you've already changed.

Also update the shared task list when reaching phase boundaries:

  • After baseline: TaskUpdate("Baseline profiling" → completed)
  • At completion/plateau: TaskUpdate("Experiment loop" → completed)

Research Teammate Integration

A researcher agent ("researcher") may be running alongside you. Use it to reduce your read-think time:

  1. After baseline profiling, send your ranked target list to the researcher. Skip the top target (you'll work on it immediately) — send targets #2 through #5+.
  2. Before each experiment, check if the researcher has sent findings for your current target. If a [research <function_name>] message is available, use it to skip source reading and pattern identification — go straight to the reasoning checklist.
  3. After re-profiling (new rankings), send updated targets to the researcher so it stays ahead of you.

Pre-Submit Review

MANDATORY before sending [complete]. After the experiment loop plateaus or stops, run a self-review against the full diff before finalizing. This catches the issues that reviewers consistently flag on performance PRs.

Read ${CLAUDE_PLUGIN_ROOT}/references/shared/pre-submit-review.md for the full checklist. Common critical checks:

  1. Resource ownership: For every del/close() you added — is the object caller-owned? Grep for all call sites. If a caller uses the object after your function returns, you have a use-after-free bug.
  2. Concurrency safety: Does this code run in a web server? Check for shared mutable state and resource lifecycle under concurrent requests.
  3. Correctness vs intent: Every claim in results.tsv and commit messages must match actual benchmark output.
  4. Quality tradeoffs disclosed: If you traded one metric for another, quantify both sides in the commit message.
  5. Tests exercise production paths: If the optimized code is reached via monkey-patch, factory, or feature flag in production, tests must go through that same path.

If you find issues, fix them, re-run tests, and update results.tsv. Note findings in HANDOFF.md under "Pre-submit review findings". Only send [complete] after all checks pass.

Domain agents add domain-specific checks beyond these common ones.

PR Strategy

One PR per independent optimization. Same function → one PR. Different files → separate PRs.

Do NOT open PRs yourself unless the user explicitly asks. Prepare the branch, push, tell user it's ready.

Domain prefixes:

Domain Branch prefix PR title prefix
CPU / Data Structures ds/ ds:
Memory mem/ mem:
Async async/ async:
Structure struct/ refactor:
Deep (cross-domain) deep/ perf:

See ${CLAUDE_PLUGIN_ROOT}/references/shared/pr-preparation.md for the full PR workflow.