- Rename case-studies/pypa/ → case-studies/python/ to match .codeflash/ convention - Add case-studies/netflix/metaflow/summary.md (7-18x lz4 vs gzip) - Add case-studies/unstructured/core-product/summary.md (14.6% latency, 2.1 GB memory) - Update main README results table with all five case studies
261 lines
10 KiB
Markdown
261 lines
10 KiB
Markdown
# codeflash-agent
|
|
|
|
Autonomous performance optimization platform. Profiles code, implements optimizations, benchmarks before and after, and iterates until plateau.
|
|
|
|
**What it's achieved on real projects:**
|
|
|
|
| Project | Result | Details |
|
|
|---|---|---|
|
|
| Rich | 2x Console import (79ms → 34ms) | [summary](case-studies/textualize/rich/summary.md) |
|
|
| pip | 7x `--version` (138ms → 20ms), 1.81x resolver | [summary](case-studies/python/pip/summary.md) |
|
|
| typeagent-py | 2.6x query path, 1.16x import + indexing | [summary](case-studies/microsoft/typeagent/summary.md) |
|
|
| core-product | 14.6% latency, 2.1 GB memory savings | [summary](case-studies/unstructured/core-product/summary.md) |
|
|
| metaflow | 7-18x artifact compression (lz4 vs gzip) | [summary](case-studies/netflix/metaflow/summary.md) |
|
|
|
|
## Domains
|
|
|
|
| Domain | When to use |
|
|
|--------|-------------|
|
|
| **CPU** | CPU time, O(n²) loops, wrong containers, algorithmic complexity |
|
|
| **Memory** | Peak memory, OOM, memory leaks, RSS reduction |
|
|
| **Async** | Concurrency, event loop blocking, sequential awaits, throughput/latency |
|
|
| **Structure** | Import time, circular deps, module reorganization for performance |
|
|
| **Deep** | Cross-domain optimization — profiles all domains and iterates until plateau |
|
|
|
|
The agent auto-detects which domain(s) apply based on your request.
|
|
|
|
## Install
|
|
|
|
Build the plugin first, then launch Claude with it:
|
|
|
|
```bash
|
|
git clone https://github.com/codeflash-ai/codeflash-agent.git
|
|
cd codeflash-agent
|
|
make build-plugin # assembles plugin into dist/ — must run before launching
|
|
claude --dangerously-skip-permissions --effort max --plugin-dir ./dist/
|
|
```
|
|
|
|
## Your first optimization
|
|
|
|
Just run:
|
|
|
|
```
|
|
> /codeflash-optimize start
|
|
```
|
|
|
|
If you know where the problem lies, describe it in natural language instead:
|
|
|
|
```
|
|
> Our /process endpoint takes 5s but individual calls should only take 500ms each
|
|
> process_records is too slow, it's doing O(n²) lookups
|
|
```
|
|
|
|
Other commands:
|
|
|
|
```
|
|
> /codeflash-optimize scan # quick cross-domain diagnosis (no changes)
|
|
> /codeflash-optimize status # check progress
|
|
> /codeflash-optimize resume # continue from where you left off
|
|
> /codeflash-optimize review # review current changes or a PR
|
|
```
|
|
|
|
Codeflash will profile, analyze, implement fixes one at a time, re-profile after each, and stop when gains plateau. Session state persists in `HANDOFF.md` and `results.tsv` so you can resume across conversations.
|
|
|
|
## For contributors
|
|
|
|
### Dev setup
|
|
|
|
```bash
|
|
git clone https://github.com/codeflash-ai/codeflash-agent.git
|
|
cd codeflash-agent
|
|
uv sync # install all packages + dev deps
|
|
prek run --all-files # lint: ruff check, ruff format, interrogate, mypy
|
|
uv run pytest packages/ -v # test all packages
|
|
```
|
|
|
|
### Plugin development
|
|
|
|
```bash
|
|
make build-plugin # assemble plugin → dist/ (base + python overlay + vendor)
|
|
make clean # remove dist/
|
|
```
|
|
|
|
The plugin is self-contained under `plugin/`:
|
|
- `plugin/` — language-agnostic agents, hooks, shared references
|
|
- `plugin/languages/python/` — Python domain agents, skills, references
|
|
- `plugin/languages/javascript/` — JavaScript domain agents, skills, references
|
|
- `make build-plugin` assembles base + language overlay into `dist/` (default: `LANG=python`)
|
|
|
|
## Optimization patterns
|
|
|
|
Distilled from 122 pip commits + 2 Rich PRs. Ordered by typical impact.
|
|
|
|
| Tier | Category | Examples | Typical impact |
|
|
|---|---|---|---|
|
|
| 1 | **Startup / Import** | Fast-path early exit, import deferral, `TYPE_CHECKING` guards, dead import removal | 2-100x for startup paths |
|
|
| 2 | **Architecture** | `@dataclass` → `__slots__`, lazy loading, speculative prefetch, conditional rebuild, caching | 10-60% on hot paths |
|
|
| 3 | **Micro** | Identity shortcuts (`is` before `==`), bypass public API internally, hoist to module level, `__slots__` on hot classes | 1.1-1.8x per call |
|
|
| 4 | **I/O** | Replace slow serializers, connection pooling, parallel I/O | 2-5x for I/O-bound ops |
|
|
|
|
**Anti-patterns to avoid:** caching with low hit rate, premature `__slots__`, over-deferring imports in one-time paths, optimizing cold paths.
|
|
|
|
Full pattern catalog with examples: [docs/codeflash-agent-dogfooding.md](docs/codeflash-agent-dogfooding.md#patterns-that-worked)
|
|
|
|
## Methodology
|
|
|
|
### Profiling toolkit
|
|
|
|
| Tool | Purpose | When to use |
|
|
|---|---|---|
|
|
| `python -X importtime` | Import cost breakdown | First step for any CLI tool |
|
|
| `hyperfine` | E2E command timing with statistics | Before/after validation |
|
|
| `cProfile` / `py-spy` | Function-level CPU profiling | Finding hot functions |
|
|
| `timeit` | Micro-benchmarks for specific functions | Validating micro-opts |
|
|
| `memray` / `tracemalloc` | Memory profiling | Allocation-heavy paths |
|
|
| `objgraph` | Object count tracking | Finding redundant allocations |
|
|
|
|
### Workflow
|
|
|
|
```
|
|
1. Profile → identify top-N bottlenecks
|
|
2. For each bottleneck:
|
|
a. Read the actual code (don't guess from profiler shapes)
|
|
b. Implement the smallest change that addresses it
|
|
c. Micro-benchmark before/after
|
|
d. Run full test suite
|
|
e. E2E benchmark
|
|
3. Commit with clear perf: prefix and numbers
|
|
4. Repeat until plateau
|
|
```
|
|
|
|
### Environment requirements
|
|
|
|
- Non-burstable VM (e.g., Azure Standard_D2s_v5) for consistent CPU
|
|
- Multiple Python versions (3.12, 3.13 minimum — behavior differs)
|
|
- `hyperfine --warmup 5 --min-runs 30` for statistical rigor
|
|
- All tests passing before AND after every change
|
|
|
|
Full methodology details: [docs/codeflash-agent-dogfooding.md](docs/codeflash-agent-dogfooding.md#methodology)
|
|
|
|
## Workspace convention
|
|
|
|
Each target organization gets its own `<org>_org/` directory containing all repos for that org:
|
|
|
|
```
|
|
~/Desktop/work/
|
|
├── cf_org/ # Codeflash
|
|
│ ├── codeflash-agent/ # this monorepo
|
|
│ ├── codeflash/ # core engine
|
|
│ ├── codeflash-internal/ # backend service
|
|
│ └── ...
|
|
├── unstructured_org/ # Unstructured.io
|
|
│ ├── unstructured/ # open source library
|
|
│ ├── core-product/ # main product
|
|
│ ├── unstructured-inference/ # ML inference
|
|
│ └── ...
|
|
├── microsoft_org/ # Microsoft
|
|
│ └── typeagent/ # typeagent-py (Structured RAG)
|
|
├── roboflow_org/ # Roboflow
|
|
│ └── supervision/
|
|
└── <org>_org/ # new target org
|
|
└── <repo>/
|
|
```
|
|
|
|
When starting work on a new org: create `<org>_org/`, clone all relevant repos under it, and keep non-repo files out of the org directory.
|
|
|
|
## Repo structure
|
|
|
|
```
|
|
packages/
|
|
codeflash-core/ # shared foundation (models, AI client, telemetry, git)
|
|
codeflash-python/ # Python language CLI — extends core
|
|
codeflash-mcp/ # MCP server (stub)
|
|
codeflash-lsp/ # LSP server (stub)
|
|
|
|
services/
|
|
github-app/ # GitHub App integration (FastAPI)
|
|
|
|
plugin/ # Claude Code plugin (self-contained, multi-language)
|
|
languages/python/ # Python domain agents, skills, references
|
|
languages/javascript/ # JavaScript domain agents, skills, references
|
|
|
|
.codeflash/ # active optimization data (teammember/org/project)
|
|
krrt7/textualize/rich/ # 2x Rich import speedup
|
|
krrt7/python/pip/ # 7x pip --version, 1.81x resolver
|
|
krrt7/microsoft/typeagent/ # Structured RAG optimization
|
|
<member>/<org>/<project>/ # new optimization targets
|
|
|
|
case-studies/ # summaries built from .codeflash/
|
|
scripts/ # scaffold scripts
|
|
docs/ # internal guides
|
|
evals/ # eval templates & real-repo scenarios
|
|
```
|
|
|
|
## Adding an optimization target
|
|
|
|
When you optimize a new project, scaffold it in `.codeflash/` and build summaries into `case-studies/`.
|
|
|
|
### 1. Set up local workspace
|
|
|
|
Each org gets a `<org>_org/` directory under `work/`. Clone from your fork, add the upstream remote:
|
|
|
|
```bash
|
|
mkdir -p ~/Desktop/work/<org>_org
|
|
git clone https://github.com/KRRT7/<repo>.git ~/Desktop/work/<org>_org/<project>
|
|
cd ~/Desktop/work/<org>_org/<project>
|
|
git remote add upstream https://github.com/<org>/<repo>.git
|
|
```
|
|
|
|
### 2. Scaffold the project
|
|
|
|
```bash
|
|
# Single project:
|
|
make bootstrap ORG=roboflow PROJECTS=supervision
|
|
|
|
# Multiple projects under one org:
|
|
make bootstrap ORG=unstructured PROJECTS="unstructured unstructured-inference core-product"
|
|
```
|
|
|
|
This creates:
|
|
|
|
```
|
|
.codeflash/<member>/<org>/<project>/
|
|
├── README.md # results, what changed, methodology (from template)
|
|
├── bench/ # add your benchmark scripts here
|
|
├── data/ # save raw benchmark data here
|
|
└── infra/
|
|
├── cloud-init.yaml # VM provisioning (fill in remaining placeholders)
|
|
└── vm-manage.sh # VM lifecycle: create, start, stop, ssh, bench, destroy
|
|
```
|
|
|
|
### 3. Fill in the placeholders
|
|
|
|
The scaffold substitutes `<PROJECT>` automatically. You still need to fill in:
|
|
|
|
| Placeholder | Where | What to fill in |
|
|
|---|---|---|
|
|
| `<REPO_URL>` | `infra/cloud-init.yaml` | Your fork's clone URL |
|
|
| `<SETUP_COMMANDS>` | `infra/cloud-init.yaml` | Toolchain install + build (language-specific) |
|
|
| `<BENCH_COMMAND>` | `infra/cloud-init.yaml` | The command to benchmark |
|
|
| `<VERIFY_COMMAND>` | `infra/cloud-init.yaml` | Smoke test after setup |
|
|
|
|
The cloud-init template includes examples for Python, Rust, Go, Node.js, and Java.
|
|
|
|
### VM lifecycle
|
|
|
|
Each project gets a `vm-manage.sh` for the benchmark VM:
|
|
|
|
```bash
|
|
cd .codeflash/<member>/<org>/<project>
|
|
bash infra/vm-manage.sh create # provision VM with cloud-init
|
|
bash infra/vm-manage.sh bench main # run benchmarks on a branch
|
|
bash infra/vm-manage.sh ssh # SSH into VM
|
|
bash infra/vm-manage.sh stop # deallocate (stops billing)
|
|
bash infra/vm-manage.sh destroy # delete everything
|
|
```
|
|
|
|
### Examples
|
|
|
|
Use the existing projects as templates:
|
|
- [Rich](.codeflash/krrt7/textualize/rich/) — focused scope, 2 PRs, import + runtime micro-opts
|
|
- [pip](.codeflash/krrt7/python/pip/) — large scope, 122 commits across 8 categories
|