# Agent Base Protocol

Shared operational rules for all Codeflash domain optimization agents. Each agent reads this file at session start. Language-specific tooling (profilers, test runners, package managers) is in the language's own `agent-base-protocol.md`. Domain-specific overrides live in the agent prompt itself.

## Context Management

Use Explore subagents for ALL codebase investigation — reading unfamiliar code, searching for patterns, understanding architecture. Only read code directly when you are about to edit it. Do NOT run more than 2 background tasks simultaneously — over-parallelization leads to timeouts, killed tasks, and lost track of what's running. Sequential focused work produces better results than scattered parallel work.

## Blocked-State Diagnosis

When any tool (build, profiler, benchmark, test runner, shell command) returns a non-zero exit code, read `${CLAUDE_PLUGIN_ROOT}/references/shared/blocked-state.md` before declaring BLOCKED. **A wrapper exit code is never a root cause** — drill at least one level deeper (read the actual stderr, grep for `error:`, `No such file`, `Could not find artifact`, etc.) and attempt 1–2 environment or scope workarounds before recording any "blocked" status. Under AUTONOMOUS MODE, this protocol is mandatory.

## Experiment Discipline

- **PROFILING GATE: You MUST run an actual profiler and print quantified output before entering the experiment loop.** Your first action after setup must be running an actual profiler to get quantified, per-function evidence. Reading source code and guessing at bottlenecks is NOT profiling. Running tests and looking at wall-clock time is NOT profiling. If you have not printed profiler output with quantified metrics, STOP and profile first — do NOT enter the experiment loop. See your language's `agent-base-protocol.md` for which profilers to use.
- **One fix per experiment. NEVER batch multiple fixes into one edit.** Each iteration targets exactly one function/allocation/pattern. This discipline is essential — you cannot rank, skip, or reprofile if you change everything at once.
- **LOCK your measurement methodology at baseline time.** Do NOT change profiling flags, test filters, benchmark parameters, or tool settings mid-experiment. Changing methodology creates uninterpretable results. If you need different parameters, record a new baseline first and note the methodology change in HANDOFF.md.

## Commit Rules

After each KEEP, update `.codeflash/results.tsv` and `.codeflash/HANDOFF.md` with the measured evidence. Do NOT commit by default. Commit creation is allowed only when the launch prompt or `.codeflash/conventions.md` explicitly says `ALLOW AUTONOMOUS COMMITS`, and it is forbidden if the prompt or conventions say `do not commit`, `do not create branches`, `no branches`, or equivalent. If commits are explicitly allowed, stage ONLY the files you changed: `git add <specific files> && git commit -m "<domain-prefix>: <one-line summary>"`. Do NOT use `git add -A` or `git add .` — these stage scratch files, benchmarks, and user work. Each optimization gets its own commit so they can be reverted or cherry-picked independently. Do NOT commit discards. If the project has pre-commit or pre-push hooks, run them before committing — CI failures from forgotten linting waste time.

Domain commit prefixes: `perf:` (CPU), `async:` (async), `mem:` (memory), `struct:` (structure), `perf:` (deep/cross-domain).

## Git Operations Boundary

Domain agents do NOT create branches on the target project, and do NOT commit experiments that lack benchmark evidence. Branch creation is the router's job (or the user's) — it happens BEFORE you are launched. Commit authority is gated by the post-return audit defined in `${CLAUDE_PLUGIN_ROOT}/references/shared/router-base.md` (Cleanup step 0 / coordination-loop "Audit `results.tsv` before exit" bullet).

**Permitted git surface for domain agents:**

- `git status`, `git diff`, `git log`, `git branch --show-current` — read-only inspection.
- `git add <specific files> && git commit -m "..."` for a KEEP **only** when the launch prompt or `.codeflash/conventions.md` explicitly says `ALLOW AUTONOMOUS COMMITS` AND ALL of the following hold: (a) no user/session brief prohibition on branches or commits is present; (b) `results.tsv` records a numeric `improvement_pct` AND a real-measurement `optimized_metric` (ns/op, MiB, ms, throughput — not an identifier, range, estimate, or prose); (c) for any data-flow change, a behavioral-equivalence probe per `${CLAUDE_PLUGIN_ROOT}/references/shared/correctness-probe-patterns.md` was written to the project test tree and passes; (d) you are already on the optimization branch the router created.

**Forbidden without explicit user instruction:**

- `git checkout -b <anything>` — branch creation. If you are on `main` and the optimization branch does not exist, that is a router-level setup bug; report it back via the coordination loop. Do NOT paper over it by creating the branch yourself.
- `git commit` unless `ALLOW AUTONOMOUS COMMITS` is explicitly present in the launch prompt or `.codeflash/conventions.md`.
- `git branch -D <anything>`, `git branch -d <anything>` — branch deletion.
- `git push`, `git push --force`, `git push <remote> <branch>` — any remote publish. PR creation is the reviewer's step.
- `git commit` on an experiment whose `results.tsv` row has `improvement_pct=N/A` (or any non-numeric value), or whose change altered data flow without a committed correctness probe. The router's Post-Return Keep Audit will downgrade such a row to `blocked` — do not race ahead of the audit.
- `git reset --hard`, `git clean -fd`, `git checkout -- <path>` on files you did not create this session — destructive ops on the target working tree.

**When tempted to cross the boundary:** produce the diff, the benchmark numbers, the call-site list, and the correctness probe. Hand them to the reviewer. Let the reviewer — and the post-return audit — decide whether the keep graduates to a branch and a commit. An experiment that cannot survive the audit should not be escorted around it.

## Stuck State Recovery

If 5+ consecutive discards occur, apply the recovery protocol in `${CLAUDE_PLUGIN_ROOT}/references/shared/experiment-loop-base.md` before giving up. Use `${CLAUDE_PLUGIN_ROOT}/references/shared/failure-modes.md` if the problem looks like workflow breakdown rather than target-level plateau.

## I/O Ceiling Detection

After CPU optimization plateau, check if the remaining wall-clock time is dominated by I/O:

1. Run a profile with wall-clock timing (see language-specific protocol for the tool)
2. Compare CPU time to wall-clock time. If wall-clock >> CPU time, the gap is I/O wait.
3. If >50% of wall-clock is I/O, declare I/O ceiling:
   ```
   [io-ceiling] >X% of wall-clock is I/O (network/disk). Further CPU optimization is futile.
   Recommendations: async I/O, request batching, HTTP/2, connection pooling, or caching.
   ```
4. Record in HANDOFF.md and stop CPU experiments. Suggest the user consider async-domain optimization or architectural changes.

## Key Files

All session state lives in `.codeflash/`:

- **`.codeflash/results.tsv`** — Experiment log. Read at startup, append after each experiment.
- **`.codeflash/HANDOFF.md`** — Session state. Read at startup, update after each keep/discard.
- **`.codeflash/conventions.md`** — Maintainer preferences. Read at startup. Update when changes rejected.
- **`.codeflash/setup.md`** — Runner, language version, test commands, available tools. Written by setup agent.

## Session Resume

1. Read `.codeflash/HANDOFF.md`, `.codeflash/results.tsv`, `.codeflash/conventions.md`.
2. Continue the experiment loop from the strongest remaining measured target. Under AUTONOMOUS MODE, do not ask the user what to work on next; infer it from session state and profiling.

## Session Start — Common Steps

1. **Read setup.** Read `.codeflash/setup.md` for the runner, language version, and test command. Read `.codeflash/conventions.md` if it exists. Also check for org-level conventions at `../conventions.md` (project-level overrides org-level). Read `.codeflash/learnings.md` if it exists — these are discoveries from previous sessions that prevent repeating dead ends. Read CLAUDE.md. Use the runner from setup.md everywhere you see `$RUNNER`.
2. **Confirm git mode.** Read `.codeflash/conventions.md` and your launch prompt for any `GIT MODE:` line. If git mode says `NO-BRANCH NO-COMMIT`, stay on the current branch and do not create, switch, or commit anything. Otherwise, the router creates or switches to `codeflash/optimize` before launching you. If you are not already on that branch, you may `git checkout codeflash/optimize` if it exists. Do NOT create branches yourself; report a router setup gap if the branch is missing.
3. **Initialize HANDOFF.md** with environment and discovery.

Domain agents add domain-specific steps after these common steps (e.g., baseline profiling method, benchmark tier definition).

## Constraints (shared)

- **Correctness**: All previously-passing tests must still pass.
- **Simplicity**: Simpler is better. Don't add complexity for marginal gains.
- **Style**: Match existing project conventions. Don't introduce patterns maintainers will reject.

Domain agents add additional domain-specific constraints (e.g., performance measurement required for CPU, no new dependencies for memory).

## Research Tools

**context7**: `mcp__context7__resolve-library-id` then `mcp__context7__query-docs` for library docs. Use aggressively for API signatures — APIs change across versions.

**WebFetch**: For specific URLs when context7 doesn't cover a topic.

**Explore subagents**: For codebase investigation to keep your context clean.

## Progress Reporting Protocol

When running as a named teammate, send progress messages to the team lead at these milestones. If `SendMessage` is unavailable (not in a team), skip this — the file-based logging is always the source of truth.

Standard message points (domain-specific content in each agent's prompt):

1. **After baseline profiling**: Summary of profiling results
2. **After each experiment**: Target, result (KEEP/DISCARD), metrics
3. **Every 3 experiments**: Periodic progress summary for user relay
4. **At milestones (every 3-5 keeps)**: Cumulative improvement
5. **At plateau/completion**: Final summary
6. **When stuck (5+ consecutive discards)**: What's been tried
7. **Cross-domain discovery**: Signal to router — do NOT fix cross-domain issues yourself
8. **File modification notification**: After each KEEP commit, notify researcher per modified file: `SendMessage(to: "researcher", summary: "File modified", message: "[modified <file-path>]")`. This prevents the researcher from sending outdated analysis for code you've already changed.

Also update the shared task list when reaching phase boundaries:
- After baseline: `TaskUpdate("Baseline profiling" → completed)`
- At completion/plateau: `TaskUpdate("Experiment loop" → completed)`

## Research Teammate Integration

A researcher agent ("researcher") may be running alongside you. Use it to reduce your read-think time:

1. **After baseline profiling**, send your ranked target list to the researcher. Skip the top target (you'll work on it immediately) — send targets #2 through #5+.
2. **Before each experiment**, check if the researcher has sent findings for your current target. If a `[research <function_name>]` message is available, use it to skip source reading and pattern identification — go straight to the reasoning checklist.
3. **After re-profiling** (new rankings), send updated targets to the researcher so it stays ahead of you.

## Adversarial Review Cadence

Run adversarial review at milestones (every 3-5 KEEPs), not just at session end. Early review catches correctness bugs before they compound.

- **After each milestone (every 3-5 KEEPs):** Review commits since last milestone
- **At session end:** Review full branch diff (catches interaction bugs between optimizations)

Use: `node "${CLAUDE_PLUGIN_ROOT}/vendor/codex/scripts/codex-companion.mjs" adversarial-review --scope branch --wait`

If a milestone review finds issues, fix them before continuing the experiment loop.

## Pre-Submit Review

**MANDATORY before sending `[complete]`.** After the experiment loop plateaus or stops, run a self-review against the full diff before finalizing. This catches the issues that reviewers consistently flag on performance PRs.

Read `${CLAUDE_PLUGIN_ROOT}/references/shared/pre-submit-review.md` for the shared checklist and your language's `pre-submit-review.md` for language-specific checks. Common critical checks:

1. **Resource ownership:** For every resource release you added — is the object caller-owned? Check all call sites.
2. **Concurrency safety:** Does this code run in a server? Check for shared mutable state and resource lifecycle under concurrent requests.
3. **Correctness vs intent:** Every claim in results.tsv and commit messages must match actual benchmark output.
4. **Quality tradeoffs disclosed:** If you traded one metric for another, quantify both sides in the commit message.
5. **Tests exercise production paths:** If the optimized code is reached through a framework-specific path in production, tests must go through that same path.

If you find issues, fix them, re-run tests, and update results.tsv. Note findings in HANDOFF.md under "Pre-submit review findings". Only send `[complete]` after all checks pass.

Domain agents add domain-specific checks beyond these common ones.

## PR Strategy

One PR per independent optimization. Same function → one PR. Different files → separate PRs.

**Do NOT open PRs yourself** unless the user explicitly asks. Prepare the branch, push, tell user it's ready.

Domain prefixes:

| Domain | Branch prefix | PR title prefix |
|--------|--------------|-----------------|
| CPU / Data Structures | `ds/` | `ds:` |
| Memory | `mem/` | `mem:` |
| Async | `async/` | `async:` |
| Structure | `struct/` | `refactor:` |
| Deep (cross-domain) | `deep/` | `perf:` |

See `${CLAUDE_PLUGIN_ROOT}/references/shared/pr-preparation.md` for the full PR workflow.