codeflash-agent/plugin/references/shared/agent-base-protocol.md
Kevin Turcios 3b59d97647 squash
2026-04-13 14:12:17 -05:00

156 lines
11 KiB
Markdown

# Agent Base Protocol
Shared operational rules for all Codeflash domain optimization agents. Each agent reads this file at session start. Language-specific tooling (profilers, test runners, package managers) is in the language's own `agent-base-protocol.md`. Domain-specific overrides live in the agent prompt itself.
## Context Management
Use Explore subagents for ALL codebase investigation — reading unfamiliar code, searching for patterns, understanding architecture. Only read code directly when you are about to edit it. Do NOT run more than 2 background tasks simultaneously — over-parallelization leads to timeouts, killed tasks, and lost track of what's running. Sequential focused work produces better results than scattered parallel work.
## Experiment Discipline
- **PROFILING GATE: You MUST run an actual profiler and print quantified output before entering the experiment loop.** Your first action after setup must be running an actual profiler to get quantified, per-function evidence. Reading source code and guessing at bottlenecks is NOT profiling. Running tests and looking at wall-clock time is NOT profiling. If you have not printed profiler output with quantified metrics, STOP and profile first — do NOT enter the experiment loop. See your language's `agent-base-protocol.md` for which profilers to use.
- **One fix per experiment. NEVER batch multiple fixes into one edit.** Each iteration targets exactly one function/allocation/pattern. This discipline is essential — you cannot rank, skip, or reprofile if you change everything at once.
- **LOCK your measurement methodology at baseline time.** Do NOT change profiling flags, test filters, benchmark parameters, or tool settings mid-experiment. Changing methodology creates uninterpretable results. If you need different parameters, record a new baseline first and note the methodology change in HANDOFF.md.
## Commit Rules
After each KEEP, stage ONLY the files you changed: `git add <specific files> && git commit -m "<domain-prefix>: <one-line summary>"`. Do NOT use `git add -A` or `git add .` — these stage scratch files, benchmarks, and user work. Each optimization gets its own commit so they can be reverted or cherry-picked independently. Do NOT commit discards. If the project has pre-commit or pre-push hooks, run them before committing — CI failures from forgotten linting waste time.
Domain commit prefixes: `perf:` (CPU), `async:` (async), `mem:` (memory), `struct:` (structure), `perf:` (deep/cross-domain).
## Stuck State Recovery
If 5+ consecutive discards (across all strategy rotations), trigger this recovery protocol before giving up:
1. **Re-read all in-scope files from scratch.** Your mental model may have drifted — re-read the actual code, not your cached understanding.
2. **Re-read the full results log** (`.codeflash/results.tsv`). Look for patterns: which files/functions appeared in successful experiments (focus there), which techniques worked (try variants on new targets), which approaches failed repeatedly (avoid them).
3. **Re-read the original goal.** Has the focus drifted from what the user asked for?
4. **Try combining 2-3 previously successful changes** that might compound.
5. **Try the opposite** of what hasn't worked. If fine-grained optimizations keep failing, try a coarser architectural change. If local changes keep failing, try a cross-function refactor.
6. **Check git history for hints**: `git log --oneline -20 --stat` — do successful commits cluster in specific files or patterns?
If recovery still produces no improvement after 3 more experiments, **stop and report** with a summary of what was tried and why the codebase appears to be at its optimization floor for this domain.
## I/O Ceiling Detection
After CPU optimization plateau, check if the remaining wall-clock time is dominated by I/O:
1. Run a profile with wall-clock timing (see language-specific protocol for the tool)
2. Compare CPU time to wall-clock time. If wall-clock >> CPU time, the gap is I/O wait.
3. If >50% of wall-clock is I/O, declare I/O ceiling:
```
[io-ceiling] >X% of wall-clock is I/O (network/disk). Further CPU optimization is futile.
Recommendations: async I/O, request batching, HTTP/2, connection pooling, or caching.
```
4. Record in HANDOFF.md and stop CPU experiments. Suggest the user consider async-domain optimization or architectural changes.
## Key Files
All session state lives in `.codeflash/`:
- **`.codeflash/results.tsv`** — Experiment log. Read at startup, append after each experiment.
- **`.codeflash/HANDOFF.md`** — Session state. Read at startup, update after each keep/discard.
- **`.codeflash/conventions.md`** — Maintainer preferences. Read at startup. Update when changes rejected.
- **`.codeflash/setup.md`** — Runner, language version, test commands, available tools. Written by setup agent.
## Session Resume
1. Read `.codeflash/HANDOFF.md`, `.codeflash/results.tsv`, `.codeflash/conventions.md`.
2. Confirm with user what to work on next.
3. Continue the experiment loop.
## Session Start — Common Steps
1. **Read setup.** Read `.codeflash/setup.md` for the runner, language version, and test command. Read `.codeflash/conventions.md` if it exists. Also check for org-level conventions at `../conventions.md` (project-level overrides org-level). Read `.codeflash/learnings.md` if it exists — these are discoveries from previous sessions that prevent repeating dead ends. Read CLAUDE.md. Use the runner from setup.md everywhere you see `$RUNNER`.
2. **Create or switch to optimization branch.** `git checkout -b codeflash/optimize` (or `git checkout codeflash/optimize` if it already exists). All optimizations stack as commits on this single branch.
3. **Initialize HANDOFF.md** with environment and discovery.
Domain agents add domain-specific steps after these common steps (e.g., baseline profiling method, benchmark tier definition).
## Constraints (shared)
- **Correctness**: All previously-passing tests must still pass.
- **Simplicity**: Simpler is better. Don't add complexity for marginal gains.
- **Style**: Match existing project conventions. Don't introduce patterns maintainers will reject.
Domain agents add additional domain-specific constraints (e.g., performance measurement required for CPU, no new dependencies for memory).
## Research Tools
**context7**: `mcp__context7__resolve-library-id` then `mcp__context7__query-docs` for library docs. Use aggressively for API signatures — APIs change across versions.
**WebFetch**: For specific URLs when context7 doesn't cover a topic.
**Explore subagents**: For codebase investigation to keep your context clean.
## Progress Reporting Protocol
When running as a named teammate, send progress messages to the team lead at these milestones. If `SendMessage` is unavailable (not in a team), skip this — the file-based logging is always the source of truth.
Standard message points (domain-specific content in each agent's prompt):
1. **After baseline profiling**: Summary of profiling results
2. **After each experiment**: Target, result (KEEP/DISCARD), metrics
3. **Every 3 experiments**: Periodic progress summary for user relay
4. **At milestones (every 3-5 keeps)**: Cumulative improvement
5. **At plateau/completion**: Final summary
6. **When stuck (5+ consecutive discards)**: What's been tried
7. **Cross-domain discovery**: Signal to router — do NOT fix cross-domain issues yourself
8. **File modification notification**: After each KEEP commit, notify researcher per modified file: `SendMessage(to: "researcher", summary: "File modified", message: "[modified <file-path>]")`. This prevents the researcher from sending outdated analysis for code you've already changed.
Also update the shared task list when reaching phase boundaries:
- After baseline: `TaskUpdate("Baseline profiling" → completed)`
- At completion/plateau: `TaskUpdate("Experiment loop" → completed)`
## Research Teammate Integration
A researcher agent ("researcher") may be running alongside you. Use it to reduce your read-think time:
1. **After baseline profiling**, send your ranked target list to the researcher. Skip the top target (you'll work on it immediately) — send targets #2 through #5+.
2. **Before each experiment**, check if the researcher has sent findings for your current target. If a `[research <function_name>]` message is available, use it to skip source reading and pattern identification — go straight to the reasoning checklist.
3. **After re-profiling** (new rankings), send updated targets to the researcher so it stays ahead of you.
## Adversarial Review Cadence
Run adversarial review at milestones (every 3-5 KEEPs), not just at session end. Early review catches correctness bugs before they compound.
- **After each milestone (every 3-5 KEEPs):** Review commits since last milestone
- **At session end:** Review full branch diff (catches interaction bugs between optimizations)
Use: `node "${CLAUDE_PLUGIN_ROOT}/vendor/codex/scripts/codex-companion.mjs" adversarial-review --scope branch --wait`
If a milestone review finds issues, fix them before continuing the experiment loop.
## Pre-Submit Review
**MANDATORY before sending `[complete]`.** After the experiment loop plateaus or stops, run a self-review against the full diff before finalizing. This catches the issues that reviewers consistently flag on performance PRs.
Read `${CLAUDE_PLUGIN_ROOT}/references/shared/pre-submit-review.md` for the shared checklist and your language's `pre-submit-review.md` for language-specific checks. Common critical checks:
1. **Resource ownership:** For every resource release you added — is the object caller-owned? Check all call sites.
2. **Concurrency safety:** Does this code run in a server? Check for shared mutable state and resource lifecycle under concurrent requests.
3. **Correctness vs intent:** Every claim in results.tsv and commit messages must match actual benchmark output.
4. **Quality tradeoffs disclosed:** If you traded one metric for another, quantify both sides in the commit message.
5. **Tests exercise production paths:** If the optimized code is reached through a framework-specific path in production, tests must go through that same path.
If you find issues, fix them, re-run tests, and update results.tsv. Note findings in HANDOFF.md under "Pre-submit review findings". Only send `[complete]` after all checks pass.
Domain agents add domain-specific checks beyond these common ones.
## PR Strategy
One PR per independent optimization. Same function → one PR. Different files → separate PRs.
**Do NOT open PRs yourself** unless the user explicitly asks. Prepare the branch, push, tell user it's ready.
Domain prefixes:
| Domain | Branch prefix | PR title prefix |
|--------|--------------|-----------------|
| CPU / Data Structures | `ds/` | `ds:` |
| Memory | `mem/` | `mem:` |
| Async | `async/` | `async:` |
| Structure | `struct/` | `refactor:` |
| Deep (cross-domain) | `deep/` | `perf:` |
See `${CLAUDE_PLUGIN_ROOT}/references/shared/pr-preparation.md` for the full PR workflow.