codeflash-agent/README.md

262 lines
10 KiB
Markdown
Raw Normal View History

2026-03-24 21:14:04 +00:00
# codeflash-agent
2026-04-09 08:36:01 +00:00
Autonomous performance optimization platform. Profiles code, implements optimizations, benchmarks before and after, and iterates until plateau.
**What it's achieved on real projects:**
| Project | Result | Details |
|---|---|---|
| Rich | 2x Console import (79ms → 34ms) | [summary](case-studies/textualize/rich/summary.md) |
| pip | 7x `--version` (138ms → 20ms), 1.81x resolver | [summary](case-studies/python/pip/summary.md) |
| typeagent-py | 2.6x query path, 1.16x import + indexing | [summary](case-studies/microsoft/typeagent/summary.md) |
| core-product | 14.6% latency, 2.1 GB memory savings | [summary](case-studies/unstructured/core-product/summary.md) |
| metaflow | 7-18x artifact compression (lz4 vs gzip) | [summary](case-studies/netflix/metaflow/summary.md) |
2026-03-24 21:14:04 +00:00
## Domains
| Domain | When to use |
|--------|-------------|
2026-04-03 23:27:12 +00:00
| **CPU** | CPU time, O(n²) loops, wrong containers, algorithmic complexity |
2026-03-24 21:14:04 +00:00
| **Memory** | Peak memory, OOM, memory leaks, RSS reduction |
| **Async** | Concurrency, event loop blocking, sequential awaits, throughput/latency |
| **Structure** | Import time, circular deps, module reorganization for performance |
2026-04-03 23:27:12 +00:00
| **Deep** | Cross-domain optimization — profiles all domains and iterates until plateau |
2026-03-24 21:14:04 +00:00
The agent auto-detects which domain(s) apply based on your request.
## Install
2026-04-09 08:36:01 +00:00
Build the plugin first, then launch Claude with it:
2026-03-24 21:14:04 +00:00
2026-04-09 08:36:01 +00:00
```bash
git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
make build-plugin # assembles plugin into dist/ — must run before launching
claude --dangerously-skip-permissions --effort max --plugin-dir ./dist/
2026-03-24 21:14:04 +00:00
```
2026-04-09 08:36:01 +00:00
## Your first optimization
2026-03-24 21:14:04 +00:00
2026-04-09 08:36:01 +00:00
Just run:
2026-03-24 21:14:04 +00:00
```
2026-04-09 08:36:01 +00:00
> /codeflash-optimize start
2026-03-24 21:14:04 +00:00
```
2026-04-09 08:36:01 +00:00
If you know where the problem lies, describe it in natural language instead:
2026-03-24 21:14:04 +00:00
```
> Our /process endpoint takes 5s but individual calls should only take 500ms each
> process_records is too slow, it's doing O(n²) lookups
```
2026-04-09 08:36:01 +00:00
Other commands:
2026-03-24 21:14:04 +00:00
```
2026-04-03 23:27:12 +00:00
> /codeflash-optimize scan # quick cross-domain diagnosis (no changes)
2026-04-09 08:36:01 +00:00
> /codeflash-optimize status # check progress
> /codeflash-optimize resume # continue from where you left off
2026-04-03 23:27:12 +00:00
> /codeflash-optimize review # review current changes or a PR
2026-03-24 21:14:04 +00:00
```
2026-04-09 08:36:01 +00:00
Codeflash will profile, analyze, implement fixes one at a time, re-profile after each, and stop when gains plateau. Session state persists in `HANDOFF.md` and `results.tsv` so you can resume across conversations.
## For contributors
### Dev setup
```bash
git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
uv sync # install all packages + dev deps
prek run --all-files # lint: ruff check, ruff format, interrogate, mypy
uv run pytest packages/ -v # test all packages
```
### Plugin development
```bash
make build-plugin # assemble plugin → dist/ (base + python overlay + vendor)
make clean # remove dist/
```
The plugin is self-contained under `plugin/`:
- `plugin/` — language-agnostic agents, hooks, shared references
- `plugin/languages/python/` — Python domain agents, skills, references
- `plugin/languages/javascript/` — JavaScript domain agents, skills, references
- `make build-plugin` assembles base + language overlay into `dist/` (default: `LANG=python`)
## Optimization patterns
Distilled from 122 pip commits + 2 Rich PRs. Ordered by typical impact.
| Tier | Category | Examples | Typical impact |
|---|---|---|---|
| 1 | **Startup / Import** | Fast-path early exit, import deferral, `TYPE_CHECKING` guards, dead import removal | 2-100x for startup paths |
| 2 | **Architecture** | `@dataclass``__slots__`, lazy loading, speculative prefetch, conditional rebuild, caching | 10-60% on hot paths |
| 3 | **Micro** | Identity shortcuts (`is` before `==`), bypass public API internally, hoist to module level, `__slots__` on hot classes | 1.1-1.8x per call |
| 4 | **I/O** | Replace slow serializers, connection pooling, parallel I/O | 2-5x for I/O-bound ops |
**Anti-patterns to avoid:** caching with low hit rate, premature `__slots__`, over-deferring imports in one-time paths, optimizing cold paths.
Full pattern catalog with examples: [docs/codeflash-agent-dogfooding.md](docs/codeflash-agent-dogfooding.md#patterns-that-worked)
## Methodology
### Profiling toolkit
| Tool | Purpose | When to use |
|---|---|---|
| `python -X importtime` | Import cost breakdown | First step for any CLI tool |
| `hyperfine` | E2E command timing with statistics | Before/after validation |
| `cProfile` / `py-spy` | Function-level CPU profiling | Finding hot functions |
| `timeit` | Micro-benchmarks for specific functions | Validating micro-opts |
| `memray` / `tracemalloc` | Memory profiling | Allocation-heavy paths |
| `objgraph` | Object count tracking | Finding redundant allocations |
### Workflow
```
1. Profile → identify top-N bottlenecks
2. For each bottleneck:
a. Read the actual code (don't guess from profiler shapes)
b. Implement the smallest change that addresses it
c. Micro-benchmark before/after
d. Run full test suite
e. E2E benchmark
3. Commit with clear perf: prefix and numbers
4. Repeat until plateau
```
### Environment requirements
- Non-burstable VM (e.g., Azure Standard_D2s_v5) for consistent CPU
- Multiple Python versions (3.12, 3.13 minimum — behavior differs)
- `hyperfine --warmup 5 --min-runs 30` for statistical rigor
- All tests passing before AND after every change
2026-03-24 21:14:04 +00:00
2026-04-09 08:36:01 +00:00
Full methodology details: [docs/codeflash-agent-dogfooding.md](docs/codeflash-agent-dogfooding.md#methodology)
2026-03-24 21:14:04 +00:00
2026-04-09 08:36:01 +00:00
## Workspace convention
Each target organization gets its own `<org>_org/` directory containing all repos for that org:
```
~/Desktop/work/
├── cf_org/ # Codeflash
│ ├── codeflash-agent/ # this monorepo
│ ├── codeflash/ # core engine
│ ├── codeflash-internal/ # backend service
│ └── ...
├── unstructured_org/ # Unstructured.io
│ ├── unstructured/ # open source library
│ ├── core-product/ # main product
│ ├── unstructured-inference/ # ML inference
│ └── ...
├── microsoft_org/ # Microsoft
│ └── typeagent/ # typeagent-py (Structured RAG)
├── roboflow_org/ # Roboflow
│ └── supervision/
└── <org>_org/ # new target org
└── <repo>/
```
When starting work on a new org: create `<org>_org/`, clone all relevant repos under it, and keep non-repo files out of the org directory.
2026-03-24 21:14:04 +00:00
2026-04-03 22:36:50 +00:00
## Repo structure
2026-03-24 21:14:04 +00:00
```
2026-04-03 22:36:50 +00:00
packages/
codeflash-core/ # shared foundation (models, AI client, telemetry, git)
codeflash-python/ # Python language CLI — extends core
codeflash-mcp/ # MCP server (stub)
codeflash-lsp/ # LSP server (stub)
services/
2026-04-03 23:27:12 +00:00
github-app/ # GitHub App integration (FastAPI)
2026-04-03 22:36:50 +00:00
2026-04-09 08:36:01 +00:00
plugin/ # Claude Code plugin (self-contained, multi-language)
languages/python/ # Python domain agents, skills, references
languages/javascript/ # JavaScript domain agents, skills, references
2026-04-03 22:36:50 +00:00
.codeflash/ # active optimization data (teammember/org/project)
krrt7/textualize/rich/ # 2x Rich import speedup
krrt7/python/pip/ # 7x pip --version, 1.81x resolver
krrt7/microsoft/typeagent/ # Structured RAG optimization
<member>/<org>/<project>/ # new optimization targets
2026-04-03 22:36:50 +00:00
2026-04-09 08:36:01 +00:00
case-studies/ # summaries built from .codeflash/
scripts/ # scaffold scripts
2026-04-03 23:27:12 +00:00
docs/ # internal guides
2026-04-03 22:36:50 +00:00
evals/ # eval templates & real-repo scenarios
2026-03-24 21:14:04 +00:00
```
2026-04-09 08:36:01 +00:00
## Adding an optimization target
When you optimize a new project, scaffold it in `.codeflash/` and build summaries into `case-studies/`.
### 1. Set up local workspace
Each org gets a `<org>_org/` directory under `work/`. Clone from your fork, add the upstream remote:
```bash
mkdir -p ~/Desktop/work/<org>_org
git clone https://github.com/KRRT7/<repo>.git ~/Desktop/work/<org>_org/<project>
cd ~/Desktop/work/<org>_org/<project>
git remote add upstream https://github.com/<org>/<repo>.git
```
### 2. Scaffold the project
```bash
# Single project:
make bootstrap ORG=roboflow PROJECTS=supervision
# Multiple projects under one org:
make bootstrap ORG=unstructured PROJECTS="unstructured unstructured-inference core-product"
```
This creates:
```
.codeflash/<member>/<org>/<project>/
2026-04-09 08:36:01 +00:00
├── README.md # results, what changed, methodology (from template)
├── bench/ # add your benchmark scripts here
├── data/ # save raw benchmark data here
└── infra/
├── cloud-init.yaml # VM provisioning (fill in remaining placeholders)
└── vm-manage.sh # VM lifecycle: create, start, stop, ssh, bench, destroy
```
### 3. Fill in the placeholders
The scaffold substitutes `<PROJECT>` automatically. You still need to fill in:
| Placeholder | Where | What to fill in |
|---|---|---|
| `<REPO_URL>` | `infra/cloud-init.yaml` | Your fork's clone URL |
| `<SETUP_COMMANDS>` | `infra/cloud-init.yaml` | Toolchain install + build (language-specific) |
| `<BENCH_COMMAND>` | `infra/cloud-init.yaml` | The command to benchmark |
| `<VERIFY_COMMAND>` | `infra/cloud-init.yaml` | Smoke test after setup |
The cloud-init template includes examples for Python, Rust, Go, Node.js, and Java.
### VM lifecycle
Each project gets a `vm-manage.sh` for the benchmark VM:
```bash
cd .codeflash/<member>/<org>/<project>
2026-04-09 08:36:01 +00:00
bash infra/vm-manage.sh create # provision VM with cloud-init
bash infra/vm-manage.sh bench main # run benchmarks on a branch
bash infra/vm-manage.sh ssh # SSH into VM
bash infra/vm-manage.sh stop # deallocate (stops billing)
bash infra/vm-manage.sh destroy # delete everything
```
### Examples
Use the existing projects as templates:
- [Rich](.codeflash/krrt7/textualize/rich/) — focused scope, 2 PRs, import + runtime micro-opts
- [pip](.codeflash/krrt7/python/pip/) — large scope, 122 commits across 8 categories