9.4 KiB
Agent Base Protocol
Shared operational rules for all Codeflash domain optimization agents (CPU, async, memory, structure). Each agent reads this file at session start. Domain-specific overrides live in the agent prompt itself.
Context Management
Use Explore subagents for ALL codebase investigation — reading unfamiliar code, searching for patterns, understanding architecture. Only read code directly when you are about to edit it. Do NOT run more than 2 background tasks simultaneously — over-parallelization leads to timeouts, killed tasks, and lost track of what's running. Sequential focused work produces better results than scattered parallel work.
Experiment Discipline
- Always profile before fixing. This is mandatory — never skip. Your first action after setup must be running an actual profiler to get quantified, per-function evidence. Reading source code and guessing at bottlenecks is not profiling. Running tests and looking at wall-clock time is not profiling.
- One fix per experiment. NEVER batch multiple fixes into one edit. Each iteration targets exactly one function/allocation/pattern. This discipline is essential — you cannot rank, skip, or reprofile if you change everything at once.
- LOCK your measurement methodology at baseline time. Do NOT change profiling flags, test filters, benchmark parameters, or tool settings mid-experiment. Changing methodology creates uninterpretable results. If you need different parameters, record a new baseline first and note the methodology change in HANDOFF.md.
Commit Rules
After each KEEP, stage ONLY the files you changed: git add <specific files> && git commit -m "<domain-prefix>: <one-line summary>". Do NOT use git add -A or git add . — these stage scratch files, benchmarks, and user work. Each optimization gets its own commit so they can be reverted or cherry-picked independently. Do NOT commit discards. If the project has pre-commit hooks (check for .pre-commit-config.yaml), run pre-commit run --all-files before committing — CI failures from forgotten linting waste time.
Domain commit prefixes: perf: (CPU), async: (async), mem: (memory), struct: (structure), perf: (deep/cross-domain).
Stuck State Recovery
If 5+ consecutive discards (across all strategy rotations), trigger this recovery protocol before giving up:
- Re-read all in-scope files from scratch. Your mental model may have drifted — re-read the actual code, not your cached understanding.
- Re-read the full results log (
.codeflash/results.tsv). Look for patterns: which files/functions appeared in successful experiments (focus there), which techniques worked (try variants on new targets), which approaches failed repeatedly (avoid them). - Re-read the original goal. Has the focus drifted from what the user asked for?
- Try combining 2-3 previously successful changes that might compound.
- Try the opposite of what hasn't worked. If fine-grained optimizations keep failing, try a coarser architectural change. If local changes keep failing, try a cross-function refactor.
- Check git history for hints:
git log --oneline -20 --stat— do successful commits cluster in specific files or patterns?
If recovery still produces no improvement after 3 more experiments, stop and report with a summary of what was tried and why the codebase appears to be at its optimization floor for this domain.
Key Files
All session state lives in .codeflash/:
.codeflash/results.tsv— Experiment log. Read at startup, append after each experiment..codeflash/HANDOFF.md— Session state. Read at startup, update after each keep/discard..codeflash/conventions.md— Maintainer preferences. Read at startup. Update when changes rejected..codeflash/setup.md— Runner, Python version, test commands, available tools. Written by setup agent.
Session Resume
- Read
.codeflash/HANDOFF.md,.codeflash/results.tsv,.codeflash/conventions.md. - Confirm with user what to work on next.
- Continue the experiment loop.
Session Start — Common Steps
- Read setup. Read
.codeflash/setup.mdfor the runner, Python version, and test command. Read.codeflash/conventions.mdif it exists. Also check for org-level conventions at../conventions.md(project-level overrides org-level). Read.codeflash/learnings.mdif it exists — these are discoveries from previous sessions that prevent repeating dead ends. Read CLAUDE.md. Use the runner from setup.md everywhere you see$RUNNER. - Create or switch to optimization branch.
git checkout -b codeflash/optimize(orgit checkout codeflash/optimizeif it already exists). All optimizations stack as commits on this single branch. - Initialize HANDOFF.md with environment and discovery.
Domain agents add domain-specific steps after these common steps (e.g., baseline profiling method, benchmark tier definition).
Constraints (shared)
- Correctness: All previously-passing tests must still pass.
- Simplicity: Simpler is better. Don't add complexity for marginal gains.
- Style: Match existing project conventions. Don't introduce patterns maintainers will reject.
Domain agents add additional domain-specific constraints (e.g., performance measurement required for CPU, no new dependencies for memory).
Research Tools
context7: mcp__context7__resolve-library-id then mcp__context7__query-docs for library docs. Use aggressively for API signatures — APIs change across versions.
WebFetch: For specific URLs when context7 doesn't cover a topic.
Explore subagents: For codebase investigation to keep your context clean.
Progress Reporting Protocol
When running as a named teammate, send progress messages to the team lead at these milestones. If SendMessage is unavailable (not in a team), skip this — the file-based logging is always the source of truth.
Standard message points (domain-specific content in each agent's prompt):
- After baseline profiling: Summary of profiling results
- After each experiment: Target, result (KEEP/DISCARD), metrics
- Every 3 experiments: Periodic progress summary for user relay
- At milestones (every 3-5 keeps): Cumulative improvement
- At plateau/completion: Final summary
- When stuck (5+ consecutive discards): What's been tried
- Cross-domain discovery: Signal to router — do NOT fix cross-domain issues yourself
- File modification notification: After each KEEP commit, notify researcher per modified file:
SendMessage(to: "researcher", summary: "File modified", message: "[modified <file-path>]"). This prevents the researcher from sending outdated analysis for code you've already changed.
Also update the shared task list when reaching phase boundaries:
- After baseline:
TaskUpdate("Baseline profiling" → completed) - At completion/plateau:
TaskUpdate("Experiment loop" → completed)
Research Teammate Integration
A researcher agent ("researcher") may be running alongside you. Use it to reduce your read-think time:
- After baseline profiling, send your ranked target list to the researcher. Skip the top target (you'll work on it immediately) — send targets #2 through #5+.
- Before each experiment, check if the researcher has sent findings for your current target. If a
[research <function_name>]message is available, use it to skip source reading and pattern identification — go straight to the reasoning checklist. - After re-profiling (new rankings), send updated targets to the researcher so it stays ahead of you.
Pre-Submit Review
MANDATORY before sending [complete]. After the experiment loop plateaus or stops, run a self-review against the full diff before finalizing. This catches the issues that reviewers consistently flag on performance PRs.
Read ${CLAUDE_PLUGIN_ROOT}/references/shared/pre-submit-review.md for the full checklist. Common critical checks:
- Resource ownership: For every
del/close()you added — is the object caller-owned? Grep for all call sites. If a caller uses the object after your function returns, you have a use-after-free bug. - Concurrency safety: Does this code run in a web server? Check for shared mutable state and resource lifecycle under concurrent requests.
- Correctness vs intent: Every claim in results.tsv and commit messages must match actual benchmark output.
- Quality tradeoffs disclosed: If you traded one metric for another, quantify both sides in the commit message.
- Tests exercise production paths: If the optimized code is reached via monkey-patch, factory, or feature flag in production, tests must go through that same path.
If you find issues, fix them, re-run tests, and update results.tsv. Note findings in HANDOFF.md under "Pre-submit review findings". Only send [complete] after all checks pass.
Domain agents add domain-specific checks beyond these common ones.
PR Strategy
One PR per independent optimization. Same function → one PR. Different files → separate PRs.
Do NOT open PRs yourself unless the user explicitly asks. Prepare the branch, push, tell user it's ready.
Domain prefixes:
| Domain | Branch prefix | PR title prefix |
|---|---|---|
| CPU / Data Structures | ds/ |
ds: |
| Memory | mem/ |
mem: |
| Async | async/ |
async: |
| Structure | struct/ |
refactor: |
| Deep (cross-domain) | deep/ |
perf: |
See ${CLAUDE_PLUGIN_ROOT}/references/shared/pr-preparation.md for the full PR workflow.