codeflash-agent/README.md

# codeflash-agent

Autonomous performance optimization platform. Profiles code, implements optimizations, benchmarks before and after, and iterates until plateau.

**What it's achieved on real projects:**

| Project | Result | Details |
|---|---|---|
| Rich | 2x Console import (79ms → 34ms) | [summary](case-studies/textualize/rich/summary.md) |
| pip | 7x `--version` (138ms → 20ms), 1.81x resolver | [summary](case-studies/python/pip/summary.md) |
| typeagent-py | 2.6x query path, 1.16x import + indexing | [summary](case-studies/microsoft/typeagent/summary.md) |
| core-product | 14.6% latency, 2.1 GB memory savings | [summary](case-studies/unstructured/core-product/summary.md) |
| metaflow | 7-18x artifact compression (lz4 vs gzip) | [summary](case-studies/netflix/metaflow/summary.md) |

## Domains

| Domain | When to use |
|--------|-------------|
| **CPU** | CPU time, O(n²) loops, wrong containers, algorithmic complexity |
| **Memory** | Peak memory, OOM, memory leaks, RSS reduction |
| **Async** | Concurrency, event loop blocking, sequential awaits, throughput/latency |
| **Structure** | Import time, circular deps, module reorganization for performance |
| **Deep** | Cross-domain optimization — profiles all domains and iterates until plateau |

The agent auto-detects which domain(s) apply based on your request.

## Install

Build the plugin first, then launch Claude with it:

```bash
git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
make build-plugin  # assembles plugin into dist/ — must run before launching
claude --dangerously-skip-permissions --effort max --plugin-dir ./dist/
```

## Your first optimization

Just run:

```
> /codeflash-optimize start
```

If you know where the problem lies, describe it in natural language instead:

```
> Our /process endpoint takes 5s but individual calls should only take 500ms each
> process_records is too slow, it's doing O(n²) lookups
```

Other commands:

```
> /codeflash-optimize scan     # quick cross-domain diagnosis (no changes)
> /codeflash-optimize status   # check progress
> /codeflash-optimize resume   # continue from where you left off
> /codeflash-optimize review   # review current changes or a PR
```

Codeflash will profile, analyze, implement fixes one at a time, re-profile after each, and stop when gains plateau. Session state persists in `HANDOFF.md` and `results.tsv` so you can resume across conversations.

## For contributors

### Dev setup

```bash
git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
uv sync                          # install all packages + dev deps
prek run --all-files             # lint: ruff check, ruff format, interrogate, mypy
uv run pytest packages/ -v      # test all packages
```

### Plugin development

```bash
make build-plugin    # assemble plugin → dist/ (base + python overlay + vendor)
make clean           # remove dist/
```

The plugin is self-contained under `plugin/`:
- `plugin/` — language-agnostic agents, hooks, shared references
- `plugin/languages/python/` — Python domain agents, skills, references
- `plugin/languages/javascript/` — JavaScript domain agents, skills, references
- `make build-plugin` assembles base + language overlay into `dist/` (default: `LANG=python`)

## Optimization patterns

Distilled from 122 pip commits + 2 Rich PRs. Ordered by typical impact.

| Tier | Category | Examples | Typical impact |
|---|---|---|---|
| 1 | **Startup / Import** | Fast-path early exit, import deferral, `TYPE_CHECKING` guards, dead import removal | 2-100x for startup paths |
| 2 | **Architecture** | `@dataclass` → `__slots__`, lazy loading, speculative prefetch, conditional rebuild, caching | 10-60% on hot paths |
| 3 | **Micro** | Identity shortcuts (`is` before `==`), bypass public API internally, hoist to module level, `__slots__` on hot classes | 1.1-1.8x per call |
| 4 | **I/O** | Replace slow serializers, connection pooling, parallel I/O | 2-5x for I/O-bound ops |

**Anti-patterns to avoid:** caching with low hit rate, premature `__slots__`, over-deferring imports in one-time paths, optimizing cold paths.

Full pattern catalog with examples: [docs/codeflash-agent-dogfooding.md](docs/codeflash-agent-dogfooding.md#patterns-that-worked)

## Methodology

### Profiling toolkit

| Tool | Purpose | When to use |
|---|---|---|
| `python -X importtime` | Import cost breakdown | First step for any CLI tool |
| `hyperfine` | E2E command timing with statistics | Before/after validation |
| `cProfile` / `py-spy` | Function-level CPU profiling | Finding hot functions |
| `timeit` | Micro-benchmarks for specific functions | Validating micro-opts |
| `memray` / `tracemalloc` | Memory profiling | Allocation-heavy paths |
| `objgraph` | Object count tracking | Finding redundant allocations |

### Workflow

```
1. Profile → identify top-N bottlenecks
2. For each bottleneck:
   a. Read the actual code (don't guess from profiler shapes)
   b. Implement the smallest change that addresses it
   c. Micro-benchmark before/after
   d. Run full test suite
   e. E2E benchmark
3. Commit with clear perf: prefix and numbers
4. Repeat until plateau
```

### Environment requirements

- Non-burstable VM (e.g., Azure Standard_D2s_v5) for consistent CPU
- Multiple Python versions (3.12, 3.13 minimum — behavior differs)
- `hyperfine --warmup 5 --min-runs 30` for statistical rigor
- All tests passing before AND after every change

Full methodology details: [docs/codeflash-agent-dogfooding.md](docs/codeflash-agent-dogfooding.md#methodology)

## Workspace convention

Each target organization gets its own `<org>_org/` directory containing all repos for that org:

```
~/Desktop/work/
├── cf_org/                    # Codeflash
│   ├── codeflash-agent/       # this monorepo
│   ├── codeflash/             # core engine
│   ├── codeflash-internal/    # backend service
│   └── ...
├── unstructured_org/          # Unstructured.io
│   ├── unstructured/          # open source library
│   ├── core-product/          # main product
│   ├── unstructured-inference/ # ML inference
│   └── ...
├── microsoft_org/             # Microsoft
│   └── typeagent/             # typeagent-py (Structured RAG)
├── roboflow_org/              # Roboflow
│   └── supervision/
└── <org>_org/                 # new target org
    └── <repo>/
```

When starting work on a new org: create `<org>_org/`, clone all relevant repos under it, and keep non-repo files out of the org directory.

## Repo structure

```
packages/
  codeflash-core/              # shared foundation (models, AI client, telemetry, git)
  codeflash-python/            # Python language CLI — extends core
  codeflash-mcp/               # MCP server (stub)
  codeflash-lsp/               # LSP server (stub)

services/
  github-app/                  # GitHub App integration (FastAPI)

plugin/                        # Claude Code plugin (self-contained, multi-language)
  languages/python/            # Python domain agents, skills, references
  languages/javascript/        # JavaScript domain agents, skills, references

.codeflash/                  # active optimization data (teammember/org/project)
  krrt7/textualize/rich/       # 2x Rich import speedup
  krrt7/python/pip/            # 7x pip --version, 1.81x resolver
  krrt7/microsoft/typeagent/   # Structured RAG optimization
  <member>/<org>/<project>/    # new optimization targets

case-studies/                  # summaries built from .codeflash/
scripts/                       # scaffold scripts
docs/                          # internal guides
evals/                         # eval templates & real-repo scenarios
```

## Adding an optimization target

When you optimize a new project, scaffold it in `.codeflash/` and build summaries into `case-studies/`.

### 1. Set up local workspace

Each org gets a `<org>_org/` directory under `work/`. Clone from your fork, add the upstream remote:

```bash
mkdir -p ~/Desktop/work/<org>_org
git clone https://github.com/KRRT7/<repo>.git ~/Desktop/work/<org>_org/<project>
cd ~/Desktop/work/<org>_org/<project>
git remote add upstream https://github.com/<org>/<repo>.git
```

### 2. Scaffold the project

```bash
# Single project:
make bootstrap ORG=roboflow PROJECTS=supervision

# Multiple projects under one org:
make bootstrap ORG=unstructured PROJECTS="unstructured unstructured-inference core-product"
```

This creates:

```
.codeflash/<member>/<org>/<project>/
├── README.md              # results, what changed, methodology (from template)
├── bench/                 # add your benchmark scripts here
├── data/                  # save raw benchmark data here
└── infra/
    ├── cloud-init.yaml    # VM provisioning (fill in remaining placeholders)
    └── vm-manage.sh       # VM lifecycle: create, start, stop, ssh, bench, destroy
```

### 3. Fill in the placeholders

The scaffold substitutes `<PROJECT>` automatically. You still need to fill in:

| Placeholder | Where | What to fill in |
|---|---|---|
| `<REPO_URL>` | `infra/cloud-init.yaml` | Your fork's clone URL |
| `<SETUP_COMMANDS>` | `infra/cloud-init.yaml` | Toolchain install + build (language-specific) |
| `<BENCH_COMMAND>` | `infra/cloud-init.yaml` | The command to benchmark |
| `<VERIFY_COMMAND>` | `infra/cloud-init.yaml` | Smoke test after setup |

The cloud-init template includes examples for Python, Rust, Go, Node.js, and Java.

### VM lifecycle

Each project gets a `vm-manage.sh` for the benchmark VM:

```bash
cd .codeflash/<member>/<org>/<project>
bash infra/vm-manage.sh create    # provision VM with cloud-init
bash infra/vm-manage.sh bench main  # run benchmarks on a branch
bash infra/vm-manage.sh ssh       # SSH into VM
bash infra/vm-manage.sh stop      # deallocate (stops billing)
bash infra/vm-manage.sh destroy   # delete everything
```

### Examples

Use the existing projects as templates:
- [Rich](.codeflash/krrt7/textualize/rich/) — focused scope, 2 PRs, import + runtime micro-opts
- [pip](.codeflash/krrt7/python/pip/) — large scope, 122 commits across 8 categories