codeflash-agent/CLAUDE.md

# codeflash-agent

Monorepo for the Codeflash optimization platform: Python packages, Claude Code plugin, and services.

## Case Studies

Active case study data lives in `.codeflash/{teammember}/{org}/{project}/` (status, bench scripts, raw data, VM infra). Summaries are built out of `.codeflash/` into `case-studies/{org}/{project}/`.

Active case studies in `.codeflash/krrt7/`:
- `microsoft/typeagent`
- `unstructured/core-product`
- `netflix/metaflow`
- `coveragepy/coveragepy`
- `textualize/rich`
- `python/pip`
- `odoo`
- `codeflash-ai/ci-audit`

### Directory conventions

Target repos live in `~/Desktop/work/{org}_org/{project}`:
- `microsoft_org/typeagent`
- `unstructured_org/core-product`
- `netflix_org/metaflow`
- `coveragepy_org/coveragepy`

### Optimization flow

1. **Make changes** in the target repo on a `perf/<description>` branch
2. **Run tests locally** to verify nothing breaks
3. **Commit and push** to the fork
4. **Benchmark on the VM** via `ssh -A azureuser@<ip> "cd ~/<project> && git fetch origin && ..."`
5. **Record results** in `.codeflash/{teammember}/{org}/{project}/data/results.tsv`
6. **Update status.md** in `.codeflash/{teammember}/{org}/{project}/`
7. **Open a PR** on the fork with VM benchmark numbers

### VM access

VMs use SSH agent forwarding -- always connect with `ssh -A`:

| Project | VM IP | Size | Resource group |
|---|---|---|---|
| core-product | 40.65.91.158 | Standard_E4s_v5 | core-product-BENCH-RG |
| typeagent | 40.65.81.123 | Standard_D2s_v5 | typeagent-BENCH-RG |

If SSH times out, check:
1. VM is running: `az vm start --resource-group <RG> --name <vm>`
2. NSG IP is current: update `AllowSSHFromMyIP` source address in the Azure portal or via `az network nsg rule update`

### PR strategy

- **Individual PRs** on the fork (`KRRT7/<repo>`) -- one per optimization on a `perf/<description>` branch. Each is self-contained with its own benchmark numbers.
- **Stacked draft PR** (optional) on the fork (`--base main --head optimization`) -- accumulates all optimizations, shows cumulative gain.

### Benchmarking

- **`codeflash compare`** for internal benchmarks (fork PRs) -- worktree-isolated, per-function breakdown, structured markdown. Does NOT handle import time yet -- use hyperfine for that.
- **hyperfine** for upstream PRs and import time measurements -- portable, no codeflash dependency for maintainers to install.
- **Keep the VM running** during optimization sessions -- don't deallocate between benchmarks
- **Cloud-init must use ASCII only** -- Azure CLI chokes on non-ASCII (em dashes, etc.)

### Runner convention

Use `$RUNNER` in docs and scripts to refer to the Python runner. The value depends on context:

| Context | `$RUNNER` value | Why |
|---|---|---|
| VM benchmark scripts | `.venv/bin/python` | Accuracy -- uv run adds ~50% overhead and 2.5x variance |
| Upstream PR reproducers | `uv run python` | Portability -- matches how the target team works |
| Setup / verify steps | `uv run python` | Measurement accuracy doesn't matter |

## Layout

- **`packages/`** — UV workspace with Python packages (core, python, api, mcp, lsp, github-app)
- **`packages/codeflash-api/`** — FastAPI AI service (replaces Django aiservice in codeflash-internal). All optimization, repair, refinement, testgen, and ranking endpoints. Must be thoroughly unit tested with edge case coverage.
- **`plugin/`** — Claude Code plugin (language-agnostic base + language overlays under `plugin/languages/`)
- **`plugin/languages/python/`** — Python-specific plugin overlay (domain agents, skills, references)
- **`plugin/languages/go/`** — Go-specific plugin overlay (domain agents, skills, references)
- **`plugin/languages/javascript/`** — JavaScript-specific plugin overlay (domain agents, skills, references)
- **`plugin/vendor/codex/`** — Vendored OpenAI Codex runtime
- **`evals/`** — Eval templates and real-repo scenarios

## Build

```bash
make build          # Assemble plugin for all languages → dist-python/, dist-go/, dist-javascript/
make clean          # Remove all dist-*/
```

## Packages (UV workspace)

```bash
uv sync                          # Install all packages + dev deps
prek run --all-files             # Lint: ruff check, ruff format, interrogate, mypy
uv run pytest packages/ -v      # Test all packages
```

Package-specific conventions (attrs patterns, type annotations, testing) are in `packages/.claude/rules/` and load automatically when editing package source. The API package has its own rules in `packages/codeflash-api/.claude/rules/`.

## AI Service (codeflash-api)

`packages/codeflash-api/` is a ground-up FastAPI rewrite of the Django aiservice from `codeflash-internal/django/aiservice/`. The Django version is the reference implementation — port logic faithfully, don't reimplement from scratch.

Key design decisions:
- FastAPI with async throughout
- Pydantic v2 for request/response schemas only (API boundary). attrs for all internal domain models — same as the rest of the repo
- No Django ORM — use async SQLAlchemy or raw asyncpg for Postgres
- Middleware chain as FastAPI dependencies: auth → rate limit → usage tracking
- Every module must have comprehensive unit tests, including error paths and edge cases
- LLM provider abstraction layer (Azure OpenAI + Anthropic Bedrock behind a common interface)

## Plugin Development

The plugin is split for composition:
- `plugin/` has language-agnostic agents, hooks, and shared references
- `plugin/languages/python/` has Python domain agents, skills, and references
- `plugin/languages/go/` has Go domain agents, skills, and references
- `plugin/languages/javascript/` has JavaScript domain agents, skills, and references
- `make build` discovers all languages under `plugin/languages/` and builds each into `dist-<lang>/`

Agent files use `${CLAUDE_PLUGIN_ROOT}` for references. When editing agents, be aware that paths differ between source (`plugin/languages/<lang>/references/`) and assembled (`references/`).
Merge main-teammate branch 2026-04-03 22:36:50 +00:00			`# codeflash-agent`

			`Monorepo for the Codeflash optimization platform: Python packages, Claude Code plugin, and services.`

squash 2026-04-09 08:36:01 +00:00			`## Case Studies`
Merge main-teammate branch 2026-04-03 22:36:50 +00:00
Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15) Add team member dimension to case study paths so multiple contributors can track optimization data independently. Derives member from git config user.name in session-start hooks. - Move all case studies under .codeflash/krrt7/ - Rename pypa/pip → python/pip (org grouping) - Update session-start hooks, docs, scripts, and references 2026-04-15 04:04:34 +00:00			Active case study data lives in `.codeflash/{teammember}/{org}/{project}/` (status, bench scripts, raw data, VM infra). Summaries are built out of `.codeflash/` into `case-studies/{org}/{project}/`.
Merge main-teammate branch 2026-04-03 22:36:50 +00:00
Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15) Add team member dimension to case study paths so multiple contributors can track optimization data independently. Derives member from git config user.name in session-start hooks. - Move all case studies under .codeflash/krrt7/ - Rename pypa/pip → python/pip (org grouping) - Update session-start hooks, docs, scripts, and references 2026-04-15 04:04:34 +00:00			Active case studies in `.codeflash/krrt7/`:
squash 2026-04-09 08:36:01 +00:00			- `microsoft/typeagent`
			- `unstructured/core-product`
			- `netflix/metaflow`
			- `coveragepy/coveragepy`
			- `textualize/rich`
Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15) Add team member dimension to case study paths so multiple contributors can track optimization data independently. Derives member from git config user.name in session-start hooks. - Move all case studies under .codeflash/krrt7/ - Rename pypa/pip → python/pip (org grouping) - Update session-start hooks, docs, scripts, and references 2026-04-15 04:04:34 +00:00			- `python/pip`
			- `odoo`
Add codeflash-ai/ci-audit to active case studies list 2026-04-23 08:52:10 +00:00			- `codeflash-ai/ci-audit`
Merge main-teammate branch 2026-04-03 22:36:50 +00:00
squash 2026-04-09 08:36:01 +00:00			`### Directory conventions`
Merge main-teammate branch 2026-04-03 22:36:50 +00:00
squash 2026-04-09 08:36:01 +00:00			Target repos live in `~/Desktop/work/{org}_org/{project}`:
			- `microsoft_org/typeagent`
			- `unstructured_org/core-product`
			- `netflix_org/metaflow`
			- `coveragepy_org/coveragepy`
Merge main-teammate branch 2026-04-03 22:36:50 +00:00
squash 2026-04-09 08:36:01 +00:00			`### Optimization flow`
Merge main-teammate branch 2026-04-03 22:36:50 +00:00
squash 2026-04-09 08:36:01 +00:00			1. Make changes in the target repo on a `perf/<description>` branch
			`2. Run tests locally to verify nothing breaks`
			`3. Commit and push to the fork`
			4. Benchmark on the VM via `ssh -A azureuser@<ip> "cd ~/<project> && git fetch origin && ..."`
Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15) Add team member dimension to case study paths so multiple contributors can track optimization data independently. Derives member from git config user.name in session-start hooks. - Move all case studies under .codeflash/krrt7/ - Rename pypa/pip → python/pip (org grouping) - Update session-start hooks, docs, scripts, and references 2026-04-15 04:04:34 +00:00			5. Record results in `.codeflash/{teammember}/{org}/{project}/data/results.tsv`
			6. Update status.md in `.codeflash/{teammember}/{org}/{project}/`
squash 2026-04-09 08:36:01 +00:00			`7. Open a PR on the fork with VM benchmark numbers`
Merge main-teammate branch 2026-04-03 22:36:50 +00:00
squash 2026-04-09 08:36:01 +00:00			`### VM access`
Merge main-teammate branch 2026-04-03 22:36:50 +00:00
squash 2026-04-09 08:36:01 +00:00			VMs use SSH agent forwarding -- always connect with `ssh -A`:
Merge main-teammate branch 2026-04-03 22:36:50 +00:00
squash 2026-04-09 08:36:01 +00:00			`\| Project \| VM IP \| Size \| Resource group \|`
			`\|---\|---\|---\|---\|`
			`\| core-product \| 40.65.91.158 \| Standard_E4s_v5 \| core-product-BENCH-RG \|`
			`\| typeagent \| 40.65.81.123 \| Standard_D2s_v5 \| typeagent-BENCH-RG \|`

			`If SSH times out, check:`
			1. VM is running: `az vm start --resource-group <RG> --name <vm>`
			2. NSG IP is current: update `AllowSSHFromMyIP` source address in the Azure portal or via `az network nsg rule update`

			`### PR strategy`

			- Individual PRs on the fork (`KRRT7/<repo>`) -- one per optimization on a `perf/<description>` branch. Each is self-contained with its own benchmark numbers.
			- Stacked draft PR (optional) on the fork (`--base main --head optimization`) -- accumulates all optimizations, shows cumulative gain.

			`### Benchmarking`

			- `codeflash compare` for internal benchmarks (fork PRs) -- worktree-isolated, per-function breakdown, structured markdown. Does NOT handle import time yet -- use hyperfine for that.
			`- hyperfine for upstream PRs and import time measurements -- portable, no codeflash dependency for maintainers to install.`
			`- Keep the VM running during optimization sessions -- don't deallocate between benchmarks`
			`- Cloud-init must use ASCII only -- Azure CLI chokes on non-ASCII (em dashes, etc.)`

			`### Runner convention`

			Use `$RUNNER` in docs and scripts to refer to the Python runner. The value depends on context:

			\| Context \| `$RUNNER` value \| Why \|
			`\|---\|---\|---\|`
			\| VM benchmark scripts \| `.venv/bin/python` \| Accuracy -- uv run adds ~50% overhead and 2.5x variance \|
			\| Upstream PR reproducers \| `uv run python` \| Portability -- matches how the target team works \|
[FEAT] golang agents (#11) * go base * missing javascript --------- Co-authored-by: ali <--global> 2026-04-14 23:55:36 +00:00			\| Setup / verify steps \| `uv run python` \| Measurement accuracy doesn't matter \|

			`## Layout`

Add codeflash-api to project layout and rewrite context in CLAUDE.md 2026-04-22 02:12:30 +00:00			- `packages/` — UV workspace with Python packages (core, python, api, mcp, lsp, github-app)
			- `packages/codeflash-api/` — FastAPI AI service (replaces Django aiservice in codeflash-internal). All optimization, repair, refinement, testgen, and ranking endpoints. Must be thoroughly unit tested with edge case coverage.
[FEAT] golang agents (#11) * go base * missing javascript --------- Co-authored-by: ali <--global> 2026-04-14 23:55:36 +00:00			- `plugin/` — Claude Code plugin (language-agnostic base + language overlays under `plugin/languages/`)
			- `plugin/languages/python/` — Python-specific plugin overlay (domain agents, skills, references)
			- `plugin/languages/go/` — Go-specific plugin overlay (domain agents, skills, references)
			- `plugin/languages/javascript/` — JavaScript-specific plugin overlay (domain agents, skills, references)
			- `plugin/vendor/codex/` — Vendored OpenAI Codex runtime
			- `evals/` — Eval templates and real-repo scenarios

			`## Build`

			```bash
			`make build # Assemble plugin for all languages → dist-python/, dist-go/, dist-javascript/`
			`make clean # Remove all dist-*/`
			```

			`## Packages (UV workspace)`

			```bash
			`uv sync # Install all packages + dev deps`
			`prek run --all-files # Lint: ruff check, ruff format, interrogate, mypy`
			`uv run pytest packages/ -v # Test all packages`
			```

Add codeflash-api to project layout and rewrite context in CLAUDE.md 2026-04-22 02:12:30 +00:00			Package-specific conventions (attrs patterns, type annotations, testing) are in `packages/.claude/rules/` and load automatically when editing package source. The API package has its own rules in `packages/codeflash-api/.claude/rules/`.

			`## AI Service (codeflash-api)`

			`packages/codeflash-api/` is a ground-up FastAPI rewrite of the Django aiservice from `codeflash-internal/django/aiservice/`. The Django version is the reference implementation — port logic faithfully, don't reimplement from scratch.

			`Key design decisions:`
Clarify attrs for internals, Pydantic for API boundary only 2026-04-22 02:13:04 +00:00			`- FastAPI with async throughout`
			`- Pydantic v2 for request/response schemas only (API boundary). attrs for all internal domain models — same as the rest of the repo`
Add codeflash-api to project layout and rewrite context in CLAUDE.md 2026-04-22 02:12:30 +00:00			`- No Django ORM — use async SQLAlchemy or raw asyncpg for Postgres`
			`- Middleware chain as FastAPI dependencies: auth → rate limit → usage tracking`
			`- Every module must have comprehensive unit tests, including error paths and edge cases`
			`- LLM provider abstraction layer (Azure OpenAI + Anthropic Bedrock behind a common interface)`
[FEAT] golang agents (#11) * go base * missing javascript --------- Co-authored-by: ali <--global> 2026-04-14 23:55:36 +00:00
			`## Plugin Development`

			`The plugin is split for composition:`
			- `plugin/` has language-agnostic agents, hooks, and shared references
			- `plugin/languages/python/` has Python domain agents, skills, and references
			- `plugin/languages/go/` has Go domain agents, skills, and references
			- `plugin/languages/javascript/` has JavaScript domain agents, skills, and references
			- `make build` discovers all languages under `plugin/languages/` and builds each into `dist-<lang>/`

			Agent files use `${CLAUDE_PLUGIN_ROOT}` for references. When editing agents, be aware that paths differ between source (`plugin/languages/<lang>/references/`) and assembled (`references/`).