Kevin Turcios cee3987d7b cleanup

2026-04-06 05:58:13 -05:00

9.4 KiB

Raw Blame History

Agent Base Protocol

Shared operational rules for all Codeflash domain optimization agents (CPU, async, memory, structure). Each agent reads this file at session start. Domain-specific overrides live in the agent prompt itself.

Context Management

Use Explore subagents for ALL codebase investigation — reading unfamiliar code, searching for patterns, understanding architecture. Only read code directly when you are about to edit it. Do NOT run more than 2 background tasks simultaneously — over-parallelization leads to timeouts, killed tasks, and lost track of what's running. Sequential focused work produces better results than scattered parallel work.

Experiment Discipline

Always profile before fixing. This is mandatory — never skip. Your first action after setup must be running an actual profiler to get quantified, per-function evidence. Reading source code and guessing at bottlenecks is not profiling. Running tests and looking at wall-clock time is not profiling.
One fix per experiment. NEVER batch multiple fixes into one edit. Each iteration targets exactly one function/allocation/pattern. This discipline is essential — you cannot rank, skip, or reprofile if you change everything at once.
LOCK your measurement methodology at baseline time. Do NOT change profiling flags, test filters, benchmark parameters, or tool settings mid-experiment. Changing methodology creates uninterpretable results. If you need different parameters, record a new baseline first and note the methodology change in HANDOFF.md.

Commit Rules

After each KEEP, stage ONLY the files you changed: git add <specific files> && git commit -m "<domain-prefix>: <one-line summary>". Do NOT use git add -A or git add . — these stage scratch files, benchmarks, and user work. Each optimization gets its own commit so they can be reverted or cherry-picked independently. Do NOT commit discards. If the project has pre-commit hooks (check for .pre-commit-config.yaml), run pre-commit run --all-files before committing — CI failures from forgotten linting waste time.

Domain commit prefixes: perf: (CPU), async: (async), mem: (memory), struct: (structure), perf: (deep/cross-domain).

Stuck State Recovery

If 5+ consecutive discards (across all strategy rotations), trigger this recovery protocol before giving up:

Re-read all in-scope files from scratch. Your mental model may have drifted — re-read the actual code, not your cached understanding.
Re-read the full results log (.codeflash/results.tsv). Look for patterns: which files/functions appeared in successful experiments (focus there), which techniques worked (try variants on new targets), which approaches failed repeatedly (avoid them).
Re-read the original goal. Has the focus drifted from what the user asked for?
Try combining 2-3 previously successful changes that might compound.
Try the opposite of what hasn't worked. If fine-grained optimizations keep failing, try a coarser architectural change. If local changes keep failing, try a cross-function refactor.
Check git history for hints: git log --oneline -20 --stat — do successful commits cluster in specific files or patterns?

If recovery still produces no improvement after 3 more experiments, stop and report with a summary of what was tried and why the codebase appears to be at its optimization floor for this domain.

Key Files

All session state lives in .codeflash/:

.codeflash/results.tsv — Experiment log. Read at startup, append after each experiment.
.codeflash/HANDOFF.md — Session state. Read at startup, update after each keep/discard.
.codeflash/conventions.md — Maintainer preferences. Read at startup. Update when changes rejected.
.codeflash/setup.md — Runner, Python version, test commands, available tools. Written by setup agent.

Session Resume

Read .codeflash/HANDOFF.md, .codeflash/results.tsv, .codeflash/conventions.md.
Confirm with user what to work on next.
Continue the experiment loop.

Session Start — Common Steps

Read setup. Read .codeflash/setup.md for the runner, Python version, and test command. Read .codeflash/conventions.md if it exists. Also check for org-level conventions at ../conventions.md (project-level overrides org-level). Read .codeflash/learnings.md if it exists — these are discoveries from previous sessions that prevent repeating dead ends. Read CLAUDE.md. Use the runner from setup.md everywhere you see $RUNNER.
Create or switch to optimization branch. git checkout -b codeflash/optimize (or git checkout codeflash/optimize if it already exists). All optimizations stack as commits on this single branch.
Initialize HANDOFF.md with environment and discovery.

Domain agents add domain-specific steps after these common steps (e.g., baseline profiling method, benchmark tier definition).

Constraints (shared)

Correctness: All previously-passing tests must still pass.
Simplicity: Simpler is better. Don't add complexity for marginal gains.
Style: Match existing project conventions. Don't introduce patterns maintainers will reject.

Domain agents add additional domain-specific constraints (e.g., performance measurement required for CPU, no new dependencies for memory).

Research Tools

context7: mcp__context7__resolve-library-id then mcp__context7__query-docs for library docs. Use aggressively for API signatures — APIs change across versions.

WebFetch: For specific URLs when context7 doesn't cover a topic.

Explore subagents: For codebase investigation to keep your context clean.

Progress Reporting Protocol

When running as a named teammate, send progress messages to the team lead at these milestones. If SendMessage is unavailable (not in a team), skip this — the file-based logging is always the source of truth.

Standard message points (domain-specific content in each agent's prompt):

After baseline profiling: Summary of profiling results
After each experiment: Target, result (KEEP/DISCARD), metrics
Every 3 experiments: Periodic progress summary for user relay
At milestones (every 3-5 keeps): Cumulative improvement
At plateau/completion: Final summary
When stuck (5+ consecutive discards): What's been tried
Cross-domain discovery: Signal to router — do NOT fix cross-domain issues yourself
File modification notification: After each KEEP commit, notify researcher per modified file: SendMessage(to: "researcher", summary: "File modified", message: "[modified <file-path>]"). This prevents the researcher from sending outdated analysis for code you've already changed.

Also update the shared task list when reaching phase boundaries:

After baseline: TaskUpdate("Baseline profiling" → completed)
At completion/plateau: TaskUpdate("Experiment loop" → completed)

Research Teammate Integration

A researcher agent ("researcher") may be running alongside you. Use it to reduce your read-think time:

After baseline profiling, send your ranked target list to the researcher. Skip the top target (you'll work on it immediately) — send targets #2 through #5+.
Before each experiment, check if the researcher has sent findings for your current target. If a [research <function_name>] message is available, use it to skip source reading and pattern identification — go straight to the reasoning checklist.
After re-profiling (new rankings), send updated targets to the researcher so it stays ahead of you.

Pre-Submit Review

MANDATORY before sending [complete]. After the experiment loop plateaus or stops, run a self-review against the full diff before finalizing. This catches the issues that reviewers consistently flag on performance PRs.

Read ${CLAUDE_PLUGIN_ROOT}/references/shared/pre-submit-review.md for the full checklist. Common critical checks:

Resource ownership: For every del/close() you added — is the object caller-owned? Grep for all call sites. If a caller uses the object after your function returns, you have a use-after-free bug.
Concurrency safety: Does this code run in a web server? Check for shared mutable state and resource lifecycle under concurrent requests.
Correctness vs intent: Every claim in results.tsv and commit messages must match actual benchmark output.
Quality tradeoffs disclosed: If you traded one metric for another, quantify both sides in the commit message.
Tests exercise production paths: If the optimized code is reached via monkey-patch, factory, or feature flag in production, tests must go through that same path.

If you find issues, fix them, re-run tests, and update results.tsv. Note findings in HANDOFF.md under "Pre-submit review findings". Only send [complete] after all checks pass.

Domain agents add domain-specific checks beyond these common ones.

PR Strategy

One PR per independent optimization. Same function → one PR. Different files → separate PRs.

Do NOT open PRs yourself unless the user explicitly asks. Prepare the branch, push, tell user it's ready.

Domain prefixes:

Domain	Branch prefix	PR title prefix
CPU / Data Structures	`ds/`	`ds:`
Memory	`mem/`	`mem:`
Async	`async/`	`async:`
Structure	`struct/`	`refactor:`
Deep (cross-domain)	`deep/`	`perf:`

See ${CLAUDE_PLUGIN_ROOT}/references/shared/pr-preparation.md for the full PR workflow.

9.4 KiB Raw Blame History