codeflash-agent/skills/codeflash-optimize/SKILL.md
Kevin Turcios d1f34cf794
feat: git memory, guard command, stuck recovery, batched setup (#2)
* feat: git memory, guard command, stuck recovery, batched setup

- Add git history review as step 1 of every experiment loop iteration
  (read git log + diff to learn from past experiments and detect patterns)
- Add guard command as formal regression safety net (step 10) with
  revert-and-rework protocol (max 2 attempts before discard)
- Add stuck state recovery protocol (5+ consecutive discards triggers
  re-read of all files, results log analysis, goal re-check, combination
  of past successes, and opposite-strategy attempts)
- Add batched setup questions rule to orchestrator (max 4 questions per
  message, never one-at-a-time across round-trips)
- Update decision tree to include guard check before keep/discard
- Update orchestrator workflow to configure guard during session init

Inspired by patterns from uditgoenka/autoresearch.

* feat: git memory, guard command, stuck recovery, batched setup

- Add git history review as step 1 of every experiment loop iteration
  (read git log + diff to learn from past experiments and detect patterns)
- Add guard command as formal regression safety net (step 10) with
  revert-and-rework protocol (max 2 attempts before discard)
- Add stuck state recovery protocol (5+ consecutive discards triggers
  re-read of all files, results log analysis, goal re-check, combination
  of past successes, and opposite-strategy attempts)
- Add batched setup questions rule to orchestrator (max 4 questions per
  message, never one-at-a-time across round-trips)
- Update decision tree to include guard check before keep/discard
- Update orchestrator workflow to configure guard during session init

Inspired by patterns from uditgoenka/autoresearch.

* feat: add git history, guard, and config audit steps to cpu/memory/structure agents

Align experiment loops in all domain agents with async agent.
Each now includes step 1 (review git history), guard command
(revert+rework on failure), and config audit after KEEP with
domain-specific guidance.

* feat: add git history, guard, and config audit steps to cpu/memory/structure agents

Align experiment loops in all domain agents with async agent.
Each now includes step 1 (review git history), guard command
(revert+rework on failure), and config audit after KEEP with
domain-specific guidance.

* feat: add CI plugin validation workflow (#3)

Uses claude-code-action with plugin-dev plugin to validate plugin
structure, agent consistency, eval manifests, and skills on every PR.
Includes @claude mention support for interactive fixes.

* fix: correct plugin marketplace name for CI validation

plugin-dev is in claude-plugins-official, not claude-code-plugins.
Also adds plugin_marketplaces URL for discovery.

* fix: expand allowed tools for validation workflow

Add gh pr comment, gh api, cat, python3, jq to allowed tools so
Claude can post PR summary comments and subagents can function.

* fix: enable track_progress and show_full_output for debugging

* fix: remove colons from Bash glob patterns in validate allowedTools

The gh command patterns used colons (e.g. Bash(gh pr diff:*)) which
are treated as literal characters, so they never matched actual
commands like `gh pr diff 2 --name-only`. This caused 1 permission
denial per CI run and prevented the summary comment from posting.

* fix: fail CI job when validation finds issues

Add verdict step that writes PASS/FAIL to a file, with a follow-up
workflow step that exits 1 on FAIL. Previously validation reported
issues in comments but the job always succeeded.

* fix: remove double-commit contradiction in async agent and shared base

Step 4/5/6 (Implement) said "and commit" but the commit-after-KEEP
step later says "Do NOT commit discards." This meant discarded
experiments were still committed. CPU/memory/structure agents already
had it right — only commit at the KEEP step.

* fix: remove double-commit contradiction in async agent and shared base

Step 4/5/6 (Implement) said "and commit" but the commit-after-KEEP
step later says "Do NOT commit discards." This meant discarded
experiments were still committed. CPU/memory/structure agents already
had it right — only commit at the KEEP step.

* fix: treat validation warnings as blocking failures

Warnings were previously non-blocking — the verdict step only
checked for "issues that need fixing." Now any warning also
triggers FAIL.

* fix: address plugin-validator warnings

- Declare context7 MCP server in plugin.json (domain agents use it)
- Change codeflash-setup color from green to red (collision with router)
- Add memory: project and example block to codeflash-setup frontmatter

* fix: address plugin-validator warnings

- Declare context7 MCP server in plugin.json (domain agents use it)
- Change codeflash-setup color from green to red (collision with router)
- Add memory: project and example block to codeflash-setup frontmatter

* feat: add eval regression testing (Layer 3)

- baseline-scores.json: checked-in expected scores + min thresholds
  for ranking, memory-hard, memory-misdirection
- check-regression.sh: orchestrator that runs evals, scores them, and
  compares to baselines (exits 1 if any score < min)
- eval-regression.yml: on-demand CI workflow (workflow_dispatch) with
  Bedrock OIDC, artifact upload, and job summary table

* feat: add eval regression testing (Layer 3)

- baseline-scores.json: checked-in expected scores + min thresholds
  for ranking, memory-hard, memory-misdirection
- check-regression.sh: orchestrator that runs evals, scores them, and
  compares to baselines (exits 1 if any score < min)
- eval-regression.yml: on-demand CI workflow (workflow_dispatch) with
  Bedrock OIDC, artifact upload, and job summary table

* fix: address remaining plugin-validator warnings

- Add memory: project to codeflash-memory.md (matches other agents)
- Add allowed-tools to memray-profiling skill
- Fix blank line in codeflash-setup.md frontmatter
- Add git log -20 --stat to all 4 domain agents' git history step
- Add step numbering note to async reference experiment-loop.md

* fix: address remaining plugin-validator warnings

- Add memory: project to codeflash-memory.md (matches other agents)
- Add allowed-tools to memray-profiling skill
- Fix blank line in codeflash-setup.md frontmatter
- Add git log -20 --stat to all 4 domain agents' git history step
- Add step numbering note to async reference experiment-loop.md

* feat: add deterministic scoring for profiler and ranking criteria

Add session-text-based auto-scoring that overrides LLM grades for
mechanically verifiable criteria:
- used_memory_profiler: grep for memray/tracemalloc Bash commands
- profiled_iteratively: count distinct profiling runs (1=1pt, 2+=full)
- built_ranked_list_with_impact_pct: detect cProfile + ranking output

These anchor 2-4 points per eval deterministically, reducing LLM
variance. Baseline thresholds tightened from min=6 to min=7.

* feat: add deterministic scoring for profiler and ranking criteria

Add session-text-based auto-scoring that overrides LLM grades for
mechanically verifiable criteria:
- used_memory_profiler: grep for memray/tracemalloc Bash commands
- profiled_iteratively: count distinct profiling runs (1=1pt, 2+=full)
- built_ranked_list_with_impact_pct: detect cProfile + ranking output

These anchor 2-4 points per eval deterministically, reducing LLM
variance. Baseline thresholds tightened from min=6 to min=7.

* fix: parse verdict from PR comment instead of temp file

Claude writes "Verdict: FAIL/PASS" in the PR comment but doesn't
execute the python3 file-write command. The check step now reads
the claude[bot] comment via gh api and greps for the verdict line.

* fix: address all remaining validator findings for PR #2

- Remove non-standard fields (repository, license, keywords) from plugin.json
- Pin context7 MCP to @2.1.4 instead of @latest
- Add context7 fallback note in router agent
- Remove unused Read from codeflash-optimize allowed-tools
- Make AskUserQuestion usage explicit in skill body
- Add tracemalloc to memray-profiling trigger phrases
- List all reference files in memray-profiling skill

* fix: address all remaining validator findings for PR #2

- Remove non-standard fields (repository, license, keywords) from plugin.json
- Pin context7 MCP to @2.1.4 instead of @latest
- Add context7 fallback note in router agent
- Remove unused Read from codeflash-optimize allowed-tools
- Make AskUserQuestion usage explicit in skill body
- Add tracemalloc to memray-profiling trigger phrases
- List all reference files in memray-profiling skill

* fix: wire stuck state recovery into domain agents, fix skill allowed-tools

- Add 5+ consecutive discard stuck recovery protocol to all 4 domain
  agents (cpu, memory, async, structure) inline experiment loops,
  matching the shared base definition
- Add Read and Bash to codeflash-optimize skill allowed-tools so it
  can inspect session state before delegating
- Add Write to memray-profiling skill allowed-tools so it can create
  profiling harness scripts

* fix: wire stuck state recovery into domain agents, fix skill allowed-tools

- Add 5+ consecutive discard stuck recovery protocol to all 4 domain
  agents (cpu, memory, async, structure) inline experiment loops,
  matching the shared base definition
- Add Read and Bash to codeflash-optimize skill allowed-tools so it
  can inspect session state before delegating
- Add Write to memray-profiling skill allowed-tools so it can create
  profiling harness scripts

* fix: sync versions, add missing tools to router and skills

- Sync marketplace.json metadata version to 0.1.0 to match plugin.json
- Add Write and Edit to codeflash.md router tools (needed for
  writing .codeflash/conventions.md during setup)
- Remove Bash from codeflash-optimize skill (least-privilege: all
  execution is delegated via Agent)
- Add Grep and Glob to memray-profiling skill (needed for finding
  test files, capture outputs, and config files)

* fix: sync versions, add missing tools to router and skills

- Sync marketplace.json metadata version to 0.1.0 to match plugin.json
- Add Write and Edit to codeflash.md router tools (needed for
  writing .codeflash/conventions.md during setup)
- Remove Bash from codeflash-optimize skill (least-privilege: all
  execution is delegated via Agent)
- Add Grep and Glob to memray-profiling skill (needed for finding
  test files, capture outputs, and config files)

* fix: restore plugin.json fields, revert marketplace version, relax validator verdict

- Restore repository, license, keywords in plugin.json (accidentally
  removed when mcpServers was added)
- Revert marketplace.json metadata.version to 1.0.0 (collection
  version, distinct from individual plugin version 0.1.0)
- Change validator verdict rule: only FAIL on major issues, not
  warnings — prevents the LLM validator from blocking on subjective
  minor findings each run

* fix: restore plugin.json fields, revert marketplace version, relax validator verdict

- Restore repository, license, keywords in plugin.json (accidentally
  removed when mcpServers was added)
- Revert marketplace.json metadata.version to 1.0.0 (collection
  version, distinct from individual plugin version 0.1.0)
- Change validator verdict rule: only FAIL on major issues, not
  warnings — prevents the LLM validator from blocking on subjective
  minor findings each run
2026-03-27 07:43:14 -05:00

1.4 KiB

name description argument-hint allowed-tools
codeflash-optimize This skill should be used when the user asks to "optimize my code", "start an optimization session", "resume optimization", "check optimization status", "make this faster", "reduce memory usage", "fix slow functions", or "run performance experiments". Covers CPU, async, memory, and codebase structure optimization. [start|resume|status]
Agent
AskUserQuestion
Read

Optimization session launcher.

For start (or no arguments)

Before launching the agent, use the AskUserQuestion tool to ask: "Before I start optimizing, is there anything I should know? For example: areas to avoid, known constraints, things you've already tried, or specific files to focus on. Or just say 'go' to proceed."

Wait for the user's response. Then use the Agent tool to launch the codeflash agent with run_in_background: true. Include the user's original request AND their answer in the prompt. Do not add any other instructions — the agent has its own workflow.

For resume

Use the Agent tool to launch the codeflash agent with run_in_background: true. Pass resume and the user's request as the prompt.

For status

Use the Agent tool to launch the codeflash agent. Pass status as the prompt. Do NOT run in background — wait for the result and show it to the user.