mirror of
https://github.com/codeflash-ai/codeflash-agent.git
synced 2026-05-04 18:25:19 +00:00
121 lines
5.9 KiB
Markdown
121 lines
5.9 KiB
Markdown
# codeflash-agent
|
|
|
|
Monorepo for the Codeflash optimization platform: Python packages, Claude Code plugin, and services.
|
|
|
|
## Case Studies
|
|
|
|
Active case study data lives in `.codeflash/{teammember}/{org}/{project}/` (status, bench scripts, raw data, VM infra). Summaries are built out of `.codeflash/` into `case-studies/{org}/{project}/`.
|
|
|
|
Active case studies in `.codeflash/krrt7/`:
|
|
- `microsoft/typeagent`
|
|
- `unstructured/core-product`
|
|
- `netflix/metaflow`
|
|
- `coveragepy/coveragepy`
|
|
- `textualize/rich`
|
|
- `python/pip`
|
|
- `odoo`
|
|
- `codeflash-ai/ci-audit`
|
|
|
|
### Directory conventions
|
|
|
|
Target repos live in `~/Desktop/work/{org}_org/{project}`:
|
|
- `microsoft_org/typeagent`
|
|
- `unstructured_org/core-product`
|
|
- `netflix_org/metaflow`
|
|
- `coveragepy_org/coveragepy`
|
|
|
|
### Optimization flow
|
|
|
|
1. **Make changes** in the target repo on a `perf/<description>` branch
|
|
2. **Run tests locally** to verify nothing breaks
|
|
3. **Commit and push** to the fork
|
|
4. **Benchmark on the VM** via `ssh -A azureuser@<ip> "cd ~/<project> && git fetch origin && ..."`
|
|
5. **Record results** in `.codeflash/{teammember}/{org}/{project}/data/results.tsv`
|
|
6. **Update status.md** in `.codeflash/{teammember}/{org}/{project}/`
|
|
7. **Open a PR** on the fork with VM benchmark numbers
|
|
|
|
### VM access
|
|
|
|
VMs use SSH agent forwarding -- always connect with `ssh -A`:
|
|
|
|
| Project | VM IP | Size | Resource group |
|
|
|---|---|---|---|
|
|
| core-product | 40.65.91.158 | Standard_E4s_v5 | core-product-BENCH-RG |
|
|
| typeagent | 40.65.81.123 | Standard_D2s_v5 | typeagent-BENCH-RG |
|
|
|
|
If SSH times out, check:
|
|
1. VM is running: `az vm start --resource-group <RG> --name <vm>`
|
|
2. NSG IP is current: update `AllowSSHFromMyIP` source address in the Azure portal or via `az network nsg rule update`
|
|
|
|
### PR strategy
|
|
|
|
- **Individual PRs** on the fork (`KRRT7/<repo>`) -- one per optimization on a `perf/<description>` branch. Each is self-contained with its own benchmark numbers.
|
|
- **Stacked draft PR** (optional) on the fork (`--base main --head optimization`) -- accumulates all optimizations, shows cumulative gain.
|
|
|
|
### Benchmarking
|
|
|
|
- **`codeflash compare`** for internal benchmarks (fork PRs) -- worktree-isolated, per-function breakdown, structured markdown. Does NOT handle import time yet -- use hyperfine for that.
|
|
- **hyperfine** for upstream PRs and import time measurements -- portable, no codeflash dependency for maintainers to install.
|
|
- **Keep the VM running** during optimization sessions -- don't deallocate between benchmarks
|
|
- **Cloud-init must use ASCII only** -- Azure CLI chokes on non-ASCII (em dashes, etc.)
|
|
|
|
### Runner convention
|
|
|
|
Use `$RUNNER` in docs and scripts to refer to the Python runner. The value depends on context:
|
|
|
|
| Context | `$RUNNER` value | Why |
|
|
|---|---|---|
|
|
| VM benchmark scripts | `.venv/bin/python` | Accuracy -- uv run adds ~50% overhead and 2.5x variance |
|
|
| Upstream PR reproducers | `uv run python` | Portability -- matches how the target team works |
|
|
| Setup / verify steps | `uv run python` | Measurement accuracy doesn't matter |
|
|
|
|
## Layout
|
|
|
|
- **`packages/`** — UV workspace with Python packages (core, python, api, mcp, lsp, github-app)
|
|
- **`packages/codeflash-api/`** — FastAPI AI service (replaces Django aiservice in codeflash-internal). All optimization, repair, refinement, testgen, and ranking endpoints. Must be thoroughly unit tested with edge case coverage.
|
|
- **`plugin/`** — Claude Code plugin (language-agnostic base + language overlays under `plugin/languages/`)
|
|
- **`plugin/languages/python/`** — Python-specific plugin overlay (domain agents, skills, references)
|
|
- **`plugin/languages/go/`** — Go-specific plugin overlay (domain agents, skills, references)
|
|
- **`plugin/languages/javascript/`** — JavaScript-specific plugin overlay (domain agents, skills, references)
|
|
- **`plugin/vendor/codex/`** — Vendored OpenAI Codex runtime
|
|
- **`evals/`** — Eval templates and real-repo scenarios
|
|
|
|
## Build
|
|
|
|
```bash
|
|
make build # Assemble plugin for all languages → dist-python/, dist-go/, dist-javascript/
|
|
make clean # Remove all dist-*/
|
|
```
|
|
|
|
## Packages (UV workspace)
|
|
|
|
```bash
|
|
uv sync # Install all packages + dev deps
|
|
prek run --all-files # Lint: ruff check, ruff format, interrogate, mypy
|
|
uv run pytest packages/ -v # Test all packages
|
|
```
|
|
|
|
Package-specific conventions (attrs patterns, type annotations, testing) are in `packages/.claude/rules/` and load automatically when editing package source. The API package has its own rules in `packages/codeflash-api/.claude/rules/`.
|
|
|
|
## AI Service (codeflash-api)
|
|
|
|
`packages/codeflash-api/` is a ground-up FastAPI rewrite of the Django aiservice from `codeflash-internal/django/aiservice/`. The Django version is the reference implementation — port logic faithfully, don't reimplement from scratch.
|
|
|
|
Key design decisions:
|
|
- FastAPI with async throughout
|
|
- Pydantic v2 for request/response schemas only (API boundary). attrs for all internal domain models — same as the rest of the repo
|
|
- No Django ORM — use async SQLAlchemy or raw asyncpg for Postgres
|
|
- Middleware chain as FastAPI dependencies: auth → rate limit → usage tracking
|
|
- Every module must have comprehensive unit tests, including error paths and edge cases
|
|
- LLM provider abstraction layer (Azure OpenAI + Anthropic Bedrock behind a common interface)
|
|
|
|
## Plugin Development
|
|
|
|
The plugin is split for composition:
|
|
- `plugin/` has language-agnostic agents, hooks, and shared references
|
|
- `plugin/languages/python/` has Python domain agents, skills, and references
|
|
- `plugin/languages/go/` has Go domain agents, skills, and references
|
|
- `plugin/languages/javascript/` has JavaScript domain agents, skills, and references
|
|
- `make build` discovers all languages under `plugin/languages/` and builds each into `dist-<lang>/`
|
|
|
|
Agent files use `${CLAUDE_PLUGIN_ROOT}` for references. When editing agents, be aware that paths differ between source (`plugin/languages/<lang>/references/`) and assembled (`references/`).
|