* feat: git memory, guard command, stuck recovery, batched setup - Add git history review as step 1 of every experiment loop iteration (read git log + diff to learn from past experiments and detect patterns) - Add guard command as formal regression safety net (step 10) with revert-and-rework protocol (max 2 attempts before discard) - Add stuck state recovery protocol (5+ consecutive discards triggers re-read of all files, results log analysis, goal re-check, combination of past successes, and opposite-strategy attempts) - Add batched setup questions rule to orchestrator (max 4 questions per message, never one-at-a-time across round-trips) - Update decision tree to include guard check before keep/discard - Update orchestrator workflow to configure guard during session init Inspired by patterns from uditgoenka/autoresearch. * feat: git memory, guard command, stuck recovery, batched setup - Add git history review as step 1 of every experiment loop iteration (read git log + diff to learn from past experiments and detect patterns) - Add guard command as formal regression safety net (step 10) with revert-and-rework protocol (max 2 attempts before discard) - Add stuck state recovery protocol (5+ consecutive discards triggers re-read of all files, results log analysis, goal re-check, combination of past successes, and opposite-strategy attempts) - Add batched setup questions rule to orchestrator (max 4 questions per message, never one-at-a-time across round-trips) - Update decision tree to include guard check before keep/discard - Update orchestrator workflow to configure guard during session init Inspired by patterns from uditgoenka/autoresearch. * feat: add git history, guard, and config audit steps to cpu/memory/structure agents Align experiment loops in all domain agents with async agent. Each now includes step 1 (review git history), guard command (revert+rework on failure), and config audit after KEEP with domain-specific guidance. * feat: add git history, guard, and config audit steps to cpu/memory/structure agents Align experiment loops in all domain agents with async agent. Each now includes step 1 (review git history), guard command (revert+rework on failure), and config audit after KEEP with domain-specific guidance. * feat: add CI plugin validation workflow (#3) Uses claude-code-action with plugin-dev plugin to validate plugin structure, agent consistency, eval manifests, and skills on every PR. Includes @claude mention support for interactive fixes. * fix: correct plugin marketplace name for CI validation plugin-dev is in claude-plugins-official, not claude-code-plugins. Also adds plugin_marketplaces URL for discovery. * fix: expand allowed tools for validation workflow Add gh pr comment, gh api, cat, python3, jq to allowed tools so Claude can post PR summary comments and subagents can function. * fix: enable track_progress and show_full_output for debugging * fix: remove colons from Bash glob patterns in validate allowedTools The gh command patterns used colons (e.g. Bash(gh pr diff:*)) which are treated as literal characters, so they never matched actual commands like `gh pr diff 2 --name-only`. This caused 1 permission denial per CI run and prevented the summary comment from posting. * fix: fail CI job when validation finds issues Add verdict step that writes PASS/FAIL to a file, with a follow-up workflow step that exits 1 on FAIL. Previously validation reported issues in comments but the job always succeeded. * fix: remove double-commit contradiction in async agent and shared base Step 4/5/6 (Implement) said "and commit" but the commit-after-KEEP step later says "Do NOT commit discards." This meant discarded experiments were still committed. CPU/memory/structure agents already had it right — only commit at the KEEP step. * fix: remove double-commit contradiction in async agent and shared base Step 4/5/6 (Implement) said "and commit" but the commit-after-KEEP step later says "Do NOT commit discards." This meant discarded experiments were still committed. CPU/memory/structure agents already had it right — only commit at the KEEP step. * fix: treat validation warnings as blocking failures Warnings were previously non-blocking — the verdict step only checked for "issues that need fixing." Now any warning also triggers FAIL. * fix: address plugin-validator warnings - Declare context7 MCP server in plugin.json (domain agents use it) - Change codeflash-setup color from green to red (collision with router) - Add memory: project and example block to codeflash-setup frontmatter * fix: address plugin-validator warnings - Declare context7 MCP server in plugin.json (domain agents use it) - Change codeflash-setup color from green to red (collision with router) - Add memory: project and example block to codeflash-setup frontmatter * feat: add eval regression testing (Layer 3) - baseline-scores.json: checked-in expected scores + min thresholds for ranking, memory-hard, memory-misdirection - check-regression.sh: orchestrator that runs evals, scores them, and compares to baselines (exits 1 if any score < min) - eval-regression.yml: on-demand CI workflow (workflow_dispatch) with Bedrock OIDC, artifact upload, and job summary table * feat: add eval regression testing (Layer 3) - baseline-scores.json: checked-in expected scores + min thresholds for ranking, memory-hard, memory-misdirection - check-regression.sh: orchestrator that runs evals, scores them, and compares to baselines (exits 1 if any score < min) - eval-regression.yml: on-demand CI workflow (workflow_dispatch) with Bedrock OIDC, artifact upload, and job summary table * fix: address remaining plugin-validator warnings - Add memory: project to codeflash-memory.md (matches other agents) - Add allowed-tools to memray-profiling skill - Fix blank line in codeflash-setup.md frontmatter - Add git log -20 --stat to all 4 domain agents' git history step - Add step numbering note to async reference experiment-loop.md * fix: address remaining plugin-validator warnings - Add memory: project to codeflash-memory.md (matches other agents) - Add allowed-tools to memray-profiling skill - Fix blank line in codeflash-setup.md frontmatter - Add git log -20 --stat to all 4 domain agents' git history step - Add step numbering note to async reference experiment-loop.md * feat: add deterministic scoring for profiler and ranking criteria Add session-text-based auto-scoring that overrides LLM grades for mechanically verifiable criteria: - used_memory_profiler: grep for memray/tracemalloc Bash commands - profiled_iteratively: count distinct profiling runs (1=1pt, 2+=full) - built_ranked_list_with_impact_pct: detect cProfile + ranking output These anchor 2-4 points per eval deterministically, reducing LLM variance. Baseline thresholds tightened from min=6 to min=7. * feat: add deterministic scoring for profiler and ranking criteria Add session-text-based auto-scoring that overrides LLM grades for mechanically verifiable criteria: - used_memory_profiler: grep for memray/tracemalloc Bash commands - profiled_iteratively: count distinct profiling runs (1=1pt, 2+=full) - built_ranked_list_with_impact_pct: detect cProfile + ranking output These anchor 2-4 points per eval deterministically, reducing LLM variance. Baseline thresholds tightened from min=6 to min=7. * fix: parse verdict from PR comment instead of temp file Claude writes "Verdict: FAIL/PASS" in the PR comment but doesn't execute the python3 file-write command. The check step now reads the claude[bot] comment via gh api and greps for the verdict line. * fix: address all remaining validator findings for PR #2 - Remove non-standard fields (repository, license, keywords) from plugin.json - Pin context7 MCP to @2.1.4 instead of @latest - Add context7 fallback note in router agent - Remove unused Read from codeflash-optimize allowed-tools - Make AskUserQuestion usage explicit in skill body - Add tracemalloc to memray-profiling trigger phrases - List all reference files in memray-profiling skill * fix: address all remaining validator findings for PR #2 - Remove non-standard fields (repository, license, keywords) from plugin.json - Pin context7 MCP to @2.1.4 instead of @latest - Add context7 fallback note in router agent - Remove unused Read from codeflash-optimize allowed-tools - Make AskUserQuestion usage explicit in skill body - Add tracemalloc to memray-profiling trigger phrases - List all reference files in memray-profiling skill * fix: wire stuck state recovery into domain agents, fix skill allowed-tools - Add 5+ consecutive discard stuck recovery protocol to all 4 domain agents (cpu, memory, async, structure) inline experiment loops, matching the shared base definition - Add Read and Bash to codeflash-optimize skill allowed-tools so it can inspect session state before delegating - Add Write to memray-profiling skill allowed-tools so it can create profiling harness scripts * fix: wire stuck state recovery into domain agents, fix skill allowed-tools - Add 5+ consecutive discard stuck recovery protocol to all 4 domain agents (cpu, memory, async, structure) inline experiment loops, matching the shared base definition - Add Read and Bash to codeflash-optimize skill allowed-tools so it can inspect session state before delegating - Add Write to memray-profiling skill allowed-tools so it can create profiling harness scripts * fix: sync versions, add missing tools to router and skills - Sync marketplace.json metadata version to 0.1.0 to match plugin.json - Add Write and Edit to codeflash.md router tools (needed for writing .codeflash/conventions.md during setup) - Remove Bash from codeflash-optimize skill (least-privilege: all execution is delegated via Agent) - Add Grep and Glob to memray-profiling skill (needed for finding test files, capture outputs, and config files) * fix: sync versions, add missing tools to router and skills - Sync marketplace.json metadata version to 0.1.0 to match plugin.json - Add Write and Edit to codeflash.md router tools (needed for writing .codeflash/conventions.md during setup) - Remove Bash from codeflash-optimize skill (least-privilege: all execution is delegated via Agent) - Add Grep and Glob to memray-profiling skill (needed for finding test files, capture outputs, and config files) * fix: restore plugin.json fields, revert marketplace version, relax validator verdict - Restore repository, license, keywords in plugin.json (accidentally removed when mcpServers was added) - Revert marketplace.json metadata.version to 1.0.0 (collection version, distinct from individual plugin version 0.1.0) - Change validator verdict rule: only FAIL on major issues, not warnings — prevents the LLM validator from blocking on subjective minor findings each run * fix: restore plugin.json fields, revert marketplace version, relax validator verdict - Restore repository, license, keywords in plugin.json (accidentally removed when mcpServers was added) - Revert marketplace.json metadata.version to 1.0.0 (collection version, distinct from individual plugin version 0.1.0) - Change validator verdict rule: only FAIL on major issues, not warnings — prevents the LLM validator from blocking on subjective minor findings each run
4.2 KiB
| name | description | model | color | memory | tools | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| codeflash-setup | Project setup agent for codeflash optimization sessions. Detects package manager, installs the project, installs profiling tools (memray), and writes .codeflash/setup.md with the discovered environment. Called automatically before domain agents start fresh sessions. <example> Context: Router agent starts a fresh optimization session user: "Set up the project environment for optimization" assistant: "I'll launch codeflash-setup to detect the environment and install profiling tools." </example> | sonnet | red | project |
|
You are a project setup agent. Your job is to detect the project environment, install dependencies, install profiling tools, and write a setup file that domain agents will read.
Steps
1. Detect package manager
Check for these files in order (first match wins):
| File | Manager | Runner | Install cmd |
|---|---|---|---|
uv.lock or uv in pyproject.toml [tool] |
uv | uv run |
uv sync |
poetry.lock |
poetry | poetry run |
poetry install |
pdm.lock |
pdm | pdm run |
pdm install |
Pipfile.lock |
pipenv | pipenv run |
pipenv install |
pyproject.toml (no specific tool) |
pip | python |
pip install -e . |
setup.py or setup.cfg |
pip | python |
pip install -e . |
requirements.txt |
pip | python |
pip install -r requirements.txt |
# Quick detection
ls -la pyproject.toml poetry.lock uv.lock pdm.lock Pipfile.lock setup.py setup.cfg requirements.txt 2>/dev/null
If pyproject.toml exists, check its build system and [tool] section:
grep -E '^\[tool\.(uv|poetry|pdm)\]|^\[build-system\]' pyproject.toml 2>/dev/null
2. Detect Python version
$RUNNER python --version 2>&1
Also check pyproject.toml for requires-python constraint.
3. Detect test runner
Check for pytest configuration:
# Check for pytest in pyproject.toml, setup.cfg, or pytest.ini
grep -l 'pytest\|tool.pytest' pyproject.toml setup.cfg pytest.ini tox.ini 2>/dev/null
Determine the test command: $RUNNER -m pytest (default) or $RUNNER -m unittest if no pytest found.
4. Install the project
Run the install command from step 1 exactly as shown. Do NOT add --frozen, --no-sync, or --locked flags — these prevent adding new dependencies like memray. If it fails, report the error — do not guess.
5. Install profiling tools
Install memray as a dev dependency:
| Manager | Command |
|---|---|
| uv | uv add --dev memray |
| poetry | poetry add --group dev memray |
| pdm | pdm add -dG dev memray |
| pip | pip install memray |
If memray installation fails (e.g., unsupported platform, missing compiler), note it in setup.md but don't fail — tracemalloc (stdlib) is always available.
Verify memray works:
$RUNNER -c "import memray; print('memray', memray.__version__)"
6. Commit dependency changes
If steps 4 or 5 modified any files, commit only the dependency-related files:
git add pyproject.toml uv.lock poetry.lock pdm.lock Pipfile.lock requirements.txt setup.py setup.cfg 2>/dev/null
git diff --cached --quiet || git commit -m "Install project deps and profiling tools"
Only add files that actually exist. Do NOT use git add -A — it could stage unrelated user work. If nothing changed, skip this step.
7. Write .codeflash/setup.md
Create the .codeflash/ directory if needed, then write:
# Project Setup
- **Package manager**: <manager>
- **Runner**: `<runner command>`
- **Python**: <version>
- **Install command**: `<install cmd>`
- **Test command**: `<runner> -m pytest`
- **Profiling tools**: tracemalloc (stdlib), memray <version or "not available">
- **Project root**: <absolute path>
8. Print summary
Print a short summary for the parent agent:
[setup] Runner: uv run | Python: 3.12.1 | Profiling: tracemalloc, memray 1.14.0
Rules
- Do NOT read source code — only configuration files.
- Do NOT modify any project code.
- If the project is already installed (imports work), skip reinstall but still detect the runner and write setup.md.
- Keep it fast — this is a setup step, not an investigation.