codeflash-admin/codeflash-agent

Fork 0

mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

Kevin Turcios 8221ce32a2 Add codeflash-ai/ci-audit to active case studies list

2026-04-23 03:52:10 -05:00

5.9 KiB

Raw Blame History

codeflash-agent

Monorepo for the Codeflash optimization platform: Python packages, Claude Code plugin, and services.

Case Studies

Active case study data lives in .codeflash/{teammember}/{org}/{project}/ (status, bench scripts, raw data, VM infra). Summaries are built out of .codeflash/ into case-studies/{org}/{project}/.

Active case studies in .codeflash/krrt7/:

microsoft/typeagent
unstructured/core-product
netflix/metaflow
coveragepy/coveragepy
textualize/rich
python/pip
odoo
codeflash-ai/ci-audit

Directory conventions

Target repos live in ~/Desktop/work/{org}_org/{project}:

microsoft_org/typeagent
unstructured_org/core-product
netflix_org/metaflow
coveragepy_org/coveragepy

Optimization flow

Make changes in the target repo on a perf/<description> branch
Run tests locally to verify nothing breaks
Commit and push to the fork
Benchmark on the VM via ssh -A azureuser@<ip> "cd ~/<project> && git fetch origin && ..."
Record results in .codeflash/{teammember}/{org}/{project}/data/results.tsv
Update status.md in .codeflash/{teammember}/{org}/{project}/
Open a PR on the fork with VM benchmark numbers

VM access

VMs use SSH agent forwarding -- always connect with ssh -A:

Project	VM IP	Size	Resource group
core-product	40.65.91.158	Standard_E4s_v5	core-product-BENCH-RG
typeagent	40.65.81.123	Standard_D2s_v5	typeagent-BENCH-RG

If SSH times out, check:

VM is running: az vm start --resource-group <RG> --name <vm>
NSG IP is current: update AllowSSHFromMyIP source address in the Azure portal or via az network nsg rule update

PR strategy

Individual PRs on the fork (KRRT7/<repo>) -- one per optimization on a perf/<description> branch. Each is self-contained with its own benchmark numbers.
Stacked draft PR (optional) on the fork (--base main --head optimization) -- accumulates all optimizations, shows cumulative gain.

Benchmarking

codeflash compare for internal benchmarks (fork PRs) -- worktree-isolated, per-function breakdown, structured markdown. Does NOT handle import time yet -- use hyperfine for that.
hyperfine for upstream PRs and import time measurements -- portable, no codeflash dependency for maintainers to install.
Keep the VM running during optimization sessions -- don't deallocate between benchmarks
Cloud-init must use ASCII only -- Azure CLI chokes on non-ASCII (em dashes, etc.)

Runner convention

Use $RUNNER in docs and scripts to refer to the Python runner. The value depends on context:

Context	`$RUNNER` value	Why
VM benchmark scripts	`.venv/bin/python`	Accuracy -- uv run adds ~50% overhead and 2.5x variance
Upstream PR reproducers	`uv run python`	Portability -- matches how the target team works
Setup / verify steps	`uv run python`	Measurement accuracy doesn't matter

Layout

packages/ — UV workspace with Python packages (core, python, api, mcp, lsp, github-app)
packages/codeflash-api/ — FastAPI AI service (replaces Django aiservice in codeflash-internal). All optimization, repair, refinement, testgen, and ranking endpoints. Must be thoroughly unit tested with edge case coverage.
plugin/ — Claude Code plugin (language-agnostic base + language overlays under plugin/languages/)
plugin/languages/python/ — Python-specific plugin overlay (domain agents, skills, references)
plugin/languages/go/ — Go-specific plugin overlay (domain agents, skills, references)
plugin/languages/javascript/ — JavaScript-specific plugin overlay (domain agents, skills, references)
plugin/vendor/codex/ — Vendored OpenAI Codex runtime
evals/ — Eval templates and real-repo scenarios

Build

make build          # Assemble plugin for all languages → dist-python/, dist-go/, dist-javascript/
make clean          # Remove all dist-*/

Packages (UV workspace)

uv sync                          # Install all packages + dev deps
prek run --all-files             # Lint: ruff check, ruff format, interrogate, mypy
uv run pytest packages/ -v      # Test all packages

Package-specific conventions (attrs patterns, type annotations, testing) are in packages/.claude/rules/ and load automatically when editing package source. The API package has its own rules in packages/codeflash-api/.claude/rules/.

AI Service (codeflash-api)

packages/codeflash-api/ is a ground-up FastAPI rewrite of the Django aiservice from codeflash-internal/django/aiservice/. The Django version is the reference implementation — port logic faithfully, don't reimplement from scratch.

Key design decisions:

FastAPI with async throughout
Pydantic v2 for request/response schemas only (API boundary). attrs for all internal domain models — same as the rest of the repo
No Django ORM — use async SQLAlchemy or raw asyncpg for Postgres
Middleware chain as FastAPI dependencies: auth → rate limit → usage tracking
Every module must have comprehensive unit tests, including error paths and edge cases
LLM provider abstraction layer (Azure OpenAI + Anthropic Bedrock behind a common interface)

Plugin Development

The plugin is split for composition:

plugin/ has language-agnostic agents, hooks, and shared references
plugin/languages/python/ has Python domain agents, skills, and references
plugin/languages/go/ has Go domain agents, skills, and references
plugin/languages/javascript/ has JavaScript domain agents, skills, and references
make build discovers all languages under plugin/languages/ and builds each into dist-<lang>/

Agent files use ${CLAUDE_PLUGIN_ROOT} for references. When editing agents, be aware that paths differ between source (plugin/languages/<lang>/references/) and assembled (references/).

5.9 KiB Raw Blame History