5.9 KiB
codeflash-agent
Monorepo for the Codeflash optimization platform: Python packages, Claude Code plugin, and services.
Case Studies
Active case study data lives in .codeflash/{teammember}/{org}/{project}/ (status, bench scripts, raw data, VM infra). Summaries are built out of .codeflash/ into case-studies/{org}/{project}/.
Active case studies in .codeflash/krrt7/:
microsoft/typeagentunstructured/core-productnetflix/metaflowcoveragepy/coveragepytextualize/richpython/pipodoocodeflash-ai/ci-audit
Directory conventions
Target repos live in ~/Desktop/work/{org}_org/{project}:
microsoft_org/typeagentunstructured_org/core-productnetflix_org/metaflowcoveragepy_org/coveragepy
Optimization flow
- Make changes in the target repo on a
perf/<description>branch - Run tests locally to verify nothing breaks
- Commit and push to the fork
- Benchmark on the VM via
ssh -A azureuser@<ip> "cd ~/<project> && git fetch origin && ..." - Record results in
.codeflash/{teammember}/{org}/{project}/data/results.tsv - Update status.md in
.codeflash/{teammember}/{org}/{project}/ - Open a PR on the fork with VM benchmark numbers
VM access
VMs use SSH agent forwarding -- always connect with ssh -A:
| Project | VM IP | Size | Resource group |
|---|---|---|---|
| core-product | 40.65.91.158 | Standard_E4s_v5 | core-product-BENCH-RG |
| typeagent | 40.65.81.123 | Standard_D2s_v5 | typeagent-BENCH-RG |
If SSH times out, check:
- VM is running:
az vm start --resource-group <RG> --name <vm> - NSG IP is current: update
AllowSSHFromMyIPsource address in the Azure portal or viaaz network nsg rule update
PR strategy
- Individual PRs on the fork (
KRRT7/<repo>) -- one per optimization on aperf/<description>branch. Each is self-contained with its own benchmark numbers. - Stacked draft PR (optional) on the fork (
--base main --head optimization) -- accumulates all optimizations, shows cumulative gain.
Benchmarking
codeflash comparefor internal benchmarks (fork PRs) -- worktree-isolated, per-function breakdown, structured markdown. Does NOT handle import time yet -- use hyperfine for that.- hyperfine for upstream PRs and import time measurements -- portable, no codeflash dependency for maintainers to install.
- Keep the VM running during optimization sessions -- don't deallocate between benchmarks
- Cloud-init must use ASCII only -- Azure CLI chokes on non-ASCII (em dashes, etc.)
Runner convention
Use $RUNNER in docs and scripts to refer to the Python runner. The value depends on context:
| Context | $RUNNER value |
Why |
|---|---|---|
| VM benchmark scripts | .venv/bin/python |
Accuracy -- uv run adds ~50% overhead and 2.5x variance |
| Upstream PR reproducers | uv run python |
Portability -- matches how the target team works |
| Setup / verify steps | uv run python |
Measurement accuracy doesn't matter |
Layout
packages/— UV workspace with Python packages (core, python, api, mcp, lsp, github-app)packages/codeflash-api/— FastAPI AI service (replaces Django aiservice in codeflash-internal). All optimization, repair, refinement, testgen, and ranking endpoints. Must be thoroughly unit tested with edge case coverage.plugin/— Claude Code plugin (language-agnostic base + language overlays underplugin/languages/)plugin/languages/python/— Python-specific plugin overlay (domain agents, skills, references)plugin/languages/go/— Go-specific plugin overlay (domain agents, skills, references)plugin/languages/javascript/— JavaScript-specific plugin overlay (domain agents, skills, references)plugin/vendor/codex/— Vendored OpenAI Codex runtimeevals/— Eval templates and real-repo scenarios
Build
make build # Assemble plugin for all languages → dist-python/, dist-go/, dist-javascript/
make clean # Remove all dist-*/
Packages (UV workspace)
uv sync # Install all packages + dev deps
prek run --all-files # Lint: ruff check, ruff format, interrogate, mypy
uv run pytest packages/ -v # Test all packages
Package-specific conventions (attrs patterns, type annotations, testing) are in packages/.claude/rules/ and load automatically when editing package source. The API package has its own rules in packages/codeflash-api/.claude/rules/.
AI Service (codeflash-api)
packages/codeflash-api/ is a ground-up FastAPI rewrite of the Django aiservice from codeflash-internal/django/aiservice/. The Django version is the reference implementation — port logic faithfully, don't reimplement from scratch.
Key design decisions:
- FastAPI with async throughout
- Pydantic v2 for request/response schemas only (API boundary). attrs for all internal domain models — same as the rest of the repo
- No Django ORM — use async SQLAlchemy or raw asyncpg for Postgres
- Middleware chain as FastAPI dependencies: auth → rate limit → usage tracking
- Every module must have comprehensive unit tests, including error paths and edge cases
- LLM provider abstraction layer (Azure OpenAI + Anthropic Bedrock behind a common interface)
Plugin Development
The plugin is split for composition:
plugin/has language-agnostic agents, hooks, and shared referencesplugin/languages/python/has Python domain agents, skills, and referencesplugin/languages/go/has Go domain agents, skills, and referencesplugin/languages/javascript/has JavaScript domain agents, skills, and referencesmake builddiscovers all languages underplugin/languages/and builds each intodist-<lang>/
Agent files use ${CLAUDE_PLUGIN_ROOT} for references. When editing agents, be aware that paths differ between source (plugin/languages/<lang>/references/) and assembled (references/).