156 lines
11 KiB
Markdown
156 lines
11 KiB
Markdown
# Agent Base Protocol
|
|
|
|
Shared operational rules for all Codeflash domain optimization agents. Each agent reads this file at session start. Language-specific tooling (profilers, test runners, package managers) is in the language's own `agent-base-protocol.md`. Domain-specific overrides live in the agent prompt itself.
|
|
|
|
## Context Management
|
|
|
|
Use Explore subagents for ALL codebase investigation — reading unfamiliar code, searching for patterns, understanding architecture. Only read code directly when you are about to edit it. Do NOT run more than 2 background tasks simultaneously — over-parallelization leads to timeouts, killed tasks, and lost track of what's running. Sequential focused work produces better results than scattered parallel work.
|
|
|
|
## Experiment Discipline
|
|
|
|
- **PROFILING GATE: You MUST run an actual profiler and print quantified output before entering the experiment loop.** Your first action after setup must be running an actual profiler to get quantified, per-function evidence. Reading source code and guessing at bottlenecks is NOT profiling. Running tests and looking at wall-clock time is NOT profiling. If you have not printed profiler output with quantified metrics, STOP and profile first — do NOT enter the experiment loop. See your language's `agent-base-protocol.md` for which profilers to use.
|
|
- **One fix per experiment. NEVER batch multiple fixes into one edit.** Each iteration targets exactly one function/allocation/pattern. This discipline is essential — you cannot rank, skip, or reprofile if you change everything at once.
|
|
- **LOCK your measurement methodology at baseline time.** Do NOT change profiling flags, test filters, benchmark parameters, or tool settings mid-experiment. Changing methodology creates uninterpretable results. If you need different parameters, record a new baseline first and note the methodology change in HANDOFF.md.
|
|
|
|
## Commit Rules
|
|
|
|
After each KEEP, stage ONLY the files you changed: `git add <specific files> && git commit -m "<domain-prefix>: <one-line summary>"`. Do NOT use `git add -A` or `git add .` — these stage scratch files, benchmarks, and user work. Each optimization gets its own commit so they can be reverted or cherry-picked independently. Do NOT commit discards. If the project has pre-commit or pre-push hooks, run them before committing — CI failures from forgotten linting waste time.
|
|
|
|
Domain commit prefixes: `perf:` (CPU), `async:` (async), `mem:` (memory), `struct:` (structure), `perf:` (deep/cross-domain).
|
|
|
|
## Stuck State Recovery
|
|
|
|
If 5+ consecutive discards (across all strategy rotations), trigger this recovery protocol before giving up:
|
|
|
|
1. **Re-read all in-scope files from scratch.** Your mental model may have drifted — re-read the actual code, not your cached understanding.
|
|
2. **Re-read the full results log** (`.codeflash/results.tsv`). Look for patterns: which files/functions appeared in successful experiments (focus there), which techniques worked (try variants on new targets), which approaches failed repeatedly (avoid them).
|
|
3. **Re-read the original goal.** Has the focus drifted from what the user asked for?
|
|
4. **Try combining 2-3 previously successful changes** that might compound.
|
|
5. **Try the opposite** of what hasn't worked. If fine-grained optimizations keep failing, try a coarser architectural change. If local changes keep failing, try a cross-function refactor.
|
|
6. **Check git history for hints**: `git log --oneline -20 --stat` — do successful commits cluster in specific files or patterns?
|
|
|
|
If recovery still produces no improvement after 3 more experiments, **stop and report** with a summary of what was tried and why the codebase appears to be at its optimization floor for this domain.
|
|
|
|
## I/O Ceiling Detection
|
|
|
|
After CPU optimization plateau, check if the remaining wall-clock time is dominated by I/O:
|
|
|
|
1. Run a profile with wall-clock timing (see language-specific protocol for the tool)
|
|
2. Compare CPU time to wall-clock time. If wall-clock >> CPU time, the gap is I/O wait.
|
|
3. If >50% of wall-clock is I/O, declare I/O ceiling:
|
|
```
|
|
[io-ceiling] >X% of wall-clock is I/O (network/disk). Further CPU optimization is futile.
|
|
Recommendations: async I/O, request batching, HTTP/2, connection pooling, or caching.
|
|
```
|
|
4. Record in HANDOFF.md and stop CPU experiments. Suggest the user consider async-domain optimization or architectural changes.
|
|
|
|
## Key Files
|
|
|
|
All session state lives in `.codeflash/`:
|
|
|
|
- **`.codeflash/results.tsv`** — Experiment log. Read at startup, append after each experiment.
|
|
- **`.codeflash/HANDOFF.md`** — Session state. Read at startup, update after each keep/discard.
|
|
- **`.codeflash/conventions.md`** — Maintainer preferences. Read at startup. Update when changes rejected.
|
|
- **`.codeflash/setup.md`** — Runner, language version, test commands, available tools. Written by setup agent.
|
|
|
|
## Session Resume
|
|
|
|
1. Read `.codeflash/HANDOFF.md`, `.codeflash/results.tsv`, `.codeflash/conventions.md`.
|
|
2. Confirm with user what to work on next.
|
|
3. Continue the experiment loop.
|
|
|
|
## Session Start — Common Steps
|
|
|
|
1. **Read setup.** Read `.codeflash/setup.md` for the runner, language version, and test command. Read `.codeflash/conventions.md` if it exists. Also check for org-level conventions at `../conventions.md` (project-level overrides org-level). Read `.codeflash/learnings.md` if it exists — these are discoveries from previous sessions that prevent repeating dead ends. Read CLAUDE.md. Use the runner from setup.md everywhere you see `$RUNNER`.
|
|
2. **Create or switch to optimization branch.** `git checkout -b codeflash/optimize` (or `git checkout codeflash/optimize` if it already exists). All optimizations stack as commits on this single branch.
|
|
3. **Initialize HANDOFF.md** with environment and discovery.
|
|
|
|
Domain agents add domain-specific steps after these common steps (e.g., baseline profiling method, benchmark tier definition).
|
|
|
|
## Constraints (shared)
|
|
|
|
- **Correctness**: All previously-passing tests must still pass.
|
|
- **Simplicity**: Simpler is better. Don't add complexity for marginal gains.
|
|
- **Style**: Match existing project conventions. Don't introduce patterns maintainers will reject.
|
|
|
|
Domain agents add additional domain-specific constraints (e.g., performance measurement required for CPU, no new dependencies for memory).
|
|
|
|
## Research Tools
|
|
|
|
**context7**: `mcp__context7__resolve-library-id` then `mcp__context7__query-docs` for library docs. Use aggressively for API signatures — APIs change across versions.
|
|
|
|
**WebFetch**: For specific URLs when context7 doesn't cover a topic.
|
|
|
|
**Explore subagents**: For codebase investigation to keep your context clean.
|
|
|
|
## Progress Reporting Protocol
|
|
|
|
When running as a named teammate, send progress messages to the team lead at these milestones. If `SendMessage` is unavailable (not in a team), skip this — the file-based logging is always the source of truth.
|
|
|
|
Standard message points (domain-specific content in each agent's prompt):
|
|
|
|
1. **After baseline profiling**: Summary of profiling results
|
|
2. **After each experiment**: Target, result (KEEP/DISCARD), metrics
|
|
3. **Every 3 experiments**: Periodic progress summary for user relay
|
|
4. **At milestones (every 3-5 keeps)**: Cumulative improvement
|
|
5. **At plateau/completion**: Final summary
|
|
6. **When stuck (5+ consecutive discards)**: What's been tried
|
|
7. **Cross-domain discovery**: Signal to router — do NOT fix cross-domain issues yourself
|
|
8. **File modification notification**: After each KEEP commit, notify researcher per modified file: `SendMessage(to: "researcher", summary: "File modified", message: "[modified <file-path>]")`. This prevents the researcher from sending outdated analysis for code you've already changed.
|
|
|
|
Also update the shared task list when reaching phase boundaries:
|
|
- After baseline: `TaskUpdate("Baseline profiling" → completed)`
|
|
- At completion/plateau: `TaskUpdate("Experiment loop" → completed)`
|
|
|
|
## Research Teammate Integration
|
|
|
|
A researcher agent ("researcher") may be running alongside you. Use it to reduce your read-think time:
|
|
|
|
1. **After baseline profiling**, send your ranked target list to the researcher. Skip the top target (you'll work on it immediately) — send targets #2 through #5+.
|
|
2. **Before each experiment**, check if the researcher has sent findings for your current target. If a `[research <function_name>]` message is available, use it to skip source reading and pattern identification — go straight to the reasoning checklist.
|
|
3. **After re-profiling** (new rankings), send updated targets to the researcher so it stays ahead of you.
|
|
|
|
## Adversarial Review Cadence
|
|
|
|
Run adversarial review at milestones (every 3-5 KEEPs), not just at session end. Early review catches correctness bugs before they compound.
|
|
|
|
- **After each milestone (every 3-5 KEEPs):** Review commits since last milestone
|
|
- **At session end:** Review full branch diff (catches interaction bugs between optimizations)
|
|
|
|
Use: `node "${CLAUDE_PLUGIN_ROOT}/vendor/codex/scripts/codex-companion.mjs" adversarial-review --scope branch --wait`
|
|
|
|
If a milestone review finds issues, fix them before continuing the experiment loop.
|
|
|
|
## Pre-Submit Review
|
|
|
|
**MANDATORY before sending `[complete]`.** After the experiment loop plateaus or stops, run a self-review against the full diff before finalizing. This catches the issues that reviewers consistently flag on performance PRs.
|
|
|
|
Read `${CLAUDE_PLUGIN_ROOT}/references/shared/pre-submit-review.md` for the shared checklist and your language's `pre-submit-review.md` for language-specific checks. Common critical checks:
|
|
|
|
1. **Resource ownership:** For every resource release you added — is the object caller-owned? Check all call sites.
|
|
2. **Concurrency safety:** Does this code run in a server? Check for shared mutable state and resource lifecycle under concurrent requests.
|
|
3. **Correctness vs intent:** Every claim in results.tsv and commit messages must match actual benchmark output.
|
|
4. **Quality tradeoffs disclosed:** If you traded one metric for another, quantify both sides in the commit message.
|
|
5. **Tests exercise production paths:** If the optimized code is reached through a framework-specific path in production, tests must go through that same path.
|
|
|
|
If you find issues, fix them, re-run tests, and update results.tsv. Note findings in HANDOFF.md under "Pre-submit review findings". Only send `[complete]` after all checks pass.
|
|
|
|
Domain agents add domain-specific checks beyond these common ones.
|
|
|
|
## PR Strategy
|
|
|
|
One PR per independent optimization. Same function → one PR. Different files → separate PRs.
|
|
|
|
**Do NOT open PRs yourself** unless the user explicitly asks. Prepare the branch, push, tell user it's ready.
|
|
|
|
Domain prefixes:
|
|
|
|
| Domain | Branch prefix | PR title prefix |
|
|
|--------|--------------|-----------------|
|
|
| CPU / Data Structures | `ds/` | `ds:` |
|
|
| Memory | `mem/` | `mem:` |
|
|
| Async | `async/` | `async:` |
|
|
| Structure | `struct/` | `refactor:` |
|
|
| Deep (cross-domain) | `deep/` | `perf:` |
|
|
|
|
See `${CLAUDE_PLUGIN_ROOT}/references/shared/pr-preparation.md` for the full PR workflow.
|