# codeflash-agent Monorepo for the Codeflash optimization platform: Python packages, Claude Code plugin, and services. ## Case Studies Active case study data lives in `.codeflash/{teammember}/{org}/{project}/` (status, bench scripts, raw data, VM infra). Summaries are built out of `.codeflash/` into `case-studies/{org}/{project}/`. Active case studies in `.codeflash/krrt7/`: - `microsoft/typeagent` - `unstructured/core-product` - `netflix/metaflow` - `coveragepy/coveragepy` - `textualize/rich` - `python/pip` - `odoo` ### Directory conventions Target repos live in `~/Desktop/work/{org}_org/{project}`: - `microsoft_org/typeagent` - `unstructured_org/core-product` - `netflix_org/metaflow` - `coveragepy_org/coveragepy` ### Optimization flow 1. **Make changes** in the target repo on a `perf/` branch 2. **Run tests locally** to verify nothing breaks 3. **Commit and push** to the fork 4. **Benchmark on the VM** via `ssh -A azureuser@ "cd ~/ && git fetch origin && ..."` 5. **Record results** in `.codeflash/{teammember}/{org}/{project}/data/results.tsv` 6. **Update status.md** in `.codeflash/{teammember}/{org}/{project}/` 7. **Open a PR** on the fork with VM benchmark numbers ### VM access VMs use SSH agent forwarding -- always connect with `ssh -A`: | Project | VM IP | Size | Resource group | |---|---|---|---| | core-product | 40.65.91.158 | Standard_E4s_v5 | core-product-BENCH-RG | | typeagent | 40.65.81.123 | Standard_D2s_v5 | typeagent-BENCH-RG | If SSH times out, check: 1. VM is running: `az vm start --resource-group --name ` 2. NSG IP is current: update `AllowSSHFromMyIP` source address in the Azure portal or via `az network nsg rule update` ### PR strategy - **Individual PRs** on the fork (`KRRT7/`) -- one per optimization on a `perf/` branch. Each is self-contained with its own benchmark numbers. - **Stacked draft PR** (optional) on the fork (`--base main --head optimization`) -- accumulates all optimizations, shows cumulative gain. ### Benchmarking - **`codeflash compare`** for internal benchmarks (fork PRs) -- worktree-isolated, per-function breakdown, structured markdown. Does NOT handle import time yet -- use hyperfine for that. - **hyperfine** for upstream PRs and import time measurements -- portable, no codeflash dependency for maintainers to install. - **Keep the VM running** during optimization sessions -- don't deallocate between benchmarks - **Cloud-init must use ASCII only** -- Azure CLI chokes on non-ASCII (em dashes, etc.) ### Runner convention Use `$RUNNER` in docs and scripts to refer to the Python runner. The value depends on context: | Context | `$RUNNER` value | Why | |---|---|---| | VM benchmark scripts | `.venv/bin/python` | Accuracy -- uv run adds ~50% overhead and 2.5x variance | | Upstream PR reproducers | `uv run python` | Portability -- matches how the target team works | | Setup / verify steps | `uv run python` | Measurement accuracy doesn't matter | ## Layout - **`packages/`** — UV workspace with Python packages (core, python, mcp, lsp, github-app) - **`plugin/`** — Claude Code plugin (language-agnostic base + language overlays under `plugin/languages/`) - **`plugin/languages/python/`** — Python-specific plugin overlay (domain agents, skills, references) - **`plugin/languages/go/`** — Go-specific plugin overlay (domain agents, skills, references) - **`plugin/languages/javascript/`** — JavaScript-specific plugin overlay (domain agents, skills, references) - **`plugin/vendor/codex/`** — Vendored OpenAI Codex runtime - **`evals/`** — Eval templates and real-repo scenarios ## Build ```bash make build # Assemble plugin for all languages → dist-python/, dist-go/, dist-javascript/ make clean # Remove all dist-*/ ``` ## Packages (UV workspace) ```bash uv sync # Install all packages + dev deps prek run --all-files # Lint: ruff check, ruff format, interrogate, mypy uv run pytest packages/ -v # Test all packages ``` Package-specific conventions (attrs patterns, type annotations, testing) are in `packages/.claude/rules/` and load automatically when editing package source. ## Plugin Development The plugin is split for composition: - `plugin/` has language-agnostic agents, hooks, and shared references - `plugin/languages/python/` has Python domain agents, skills, and references - `plugin/languages/go/` has Go domain agents, skills, and references - `plugin/languages/javascript/` has JavaScript domain agents, skills, and references - `make build` discovers all languages under `plugin/languages/` and builds each into `dist-/` Agent files use `${CLAUDE_PLUGIN_ROOT}` for references. When editing agents, be aware that paths differ between source (`plugin/languages//references/`) and assembled (`references/`).