# codeflash-agent Autonomous performance optimization platform. Profiles code, implements optimizations, benchmarks before and after, and iterates until plateau. **What it's achieved on real projects:** | Project | Result | Details | |---|---|---| | Rich | 2x Console import (79ms → 34ms) | [summary](case-studies/textualize/rich/summary.md) | | pip | 7x `--version` (138ms → 20ms), 1.81x resolver | [summary](case-studies/python/pip/summary.md) | | typeagent-py | 2.6x query path, 1.16x import + indexing | [summary](case-studies/microsoft/typeagent/summary.md) | | core-product | 14.6% latency, 2.1 GB memory savings | [summary](case-studies/unstructured/core-product/summary.md) | | metaflow | 7-18x artifact compression (lz4 vs gzip) | [summary](case-studies/netflix/metaflow/summary.md) | ## Domains | Domain | When to use | |--------|-------------| | **CPU** | CPU time, O(n²) loops, wrong containers, algorithmic complexity | | **Memory** | Peak memory, OOM, memory leaks, RSS reduction | | **Async** | Concurrency, event loop blocking, sequential awaits, throughput/latency | | **Structure** | Import time, circular deps, module reorganization for performance | | **Deep** | Cross-domain optimization — profiles all domains and iterates until plateau | The agent auto-detects which domain(s) apply based on your request. ## Install Build the plugin first, then launch Claude with it: ```bash git clone https://github.com/codeflash-ai/codeflash-agent.git cd codeflash-agent make build-plugin # assembles plugin into dist/ — must run before launching claude --dangerously-skip-permissions --effort max --plugin-dir ./dist/ ``` ## Your first optimization Just run: ``` > /codeflash-optimize start ``` If you know where the problem lies, describe it in natural language instead: ``` > Our /process endpoint takes 5s but individual calls should only take 500ms each > process_records is too slow, it's doing O(n²) lookups ``` Other commands: ``` > /codeflash-optimize scan # quick cross-domain diagnosis (no changes) > /codeflash-optimize status # check progress > /codeflash-optimize resume # continue from where you left off > /codeflash-optimize review # review current changes or a PR ``` Codeflash will profile, analyze, implement fixes one at a time, re-profile after each, and stop when gains plateau. Session state persists in `HANDOFF.md` and `results.tsv` so you can resume across conversations. ## For contributors ### Dev setup ```bash git clone https://github.com/codeflash-ai/codeflash-agent.git cd codeflash-agent uv sync # install all packages + dev deps prek run --all-files # lint: ruff check, ruff format, interrogate, mypy uv run pytest packages/ -v # test all packages ``` ### Plugin development ```bash make build-plugin # assemble plugin → dist/ (base + python overlay + vendor) make clean # remove dist/ ``` The plugin is self-contained under `plugin/`: - `plugin/` — language-agnostic agents, hooks, shared references - `plugin/languages/python/` — Python domain agents, skills, references - `plugin/languages/javascript/` — JavaScript domain agents, skills, references - `make build-plugin` assembles base + language overlay into `dist/` (default: `LANG=python`) ## Optimization patterns Distilled from 122 pip commits + 2 Rich PRs. Ordered by typical impact. | Tier | Category | Examples | Typical impact | |---|---|---|---| | 1 | **Startup / Import** | Fast-path early exit, import deferral, `TYPE_CHECKING` guards, dead import removal | 2-100x for startup paths | | 2 | **Architecture** | `@dataclass` → `__slots__`, lazy loading, speculative prefetch, conditional rebuild, caching | 10-60% on hot paths | | 3 | **Micro** | Identity shortcuts (`is` before `==`), bypass public API internally, hoist to module level, `__slots__` on hot classes | 1.1-1.8x per call | | 4 | **I/O** | Replace slow serializers, connection pooling, parallel I/O | 2-5x for I/O-bound ops | **Anti-patterns to avoid:** caching with low hit rate, premature `__slots__`, over-deferring imports in one-time paths, optimizing cold paths. Full pattern catalog with examples: [docs/codeflash-agent-dogfooding.md](docs/codeflash-agent-dogfooding.md#patterns-that-worked) ## Methodology ### Profiling toolkit | Tool | Purpose | When to use | |---|---|---| | `python -X importtime` | Import cost breakdown | First step for any CLI tool | | `hyperfine` | E2E command timing with statistics | Before/after validation | | `cProfile` / `py-spy` | Function-level CPU profiling | Finding hot functions | | `timeit` | Micro-benchmarks for specific functions | Validating micro-opts | | `memray` / `tracemalloc` | Memory profiling | Allocation-heavy paths | | `objgraph` | Object count tracking | Finding redundant allocations | ### Workflow ``` 1. Profile → identify top-N bottlenecks 2. For each bottleneck: a. Read the actual code (don't guess from profiler shapes) b. Implement the smallest change that addresses it c. Micro-benchmark before/after d. Run full test suite e. E2E benchmark 3. Commit with clear perf: prefix and numbers 4. Repeat until plateau ``` ### Environment requirements - Non-burstable VM (e.g., Azure Standard_D2s_v5) for consistent CPU - Multiple Python versions (3.12, 3.13 minimum — behavior differs) - `hyperfine --warmup 5 --min-runs 30` for statistical rigor - All tests passing before AND after every change Full methodology details: [docs/codeflash-agent-dogfooding.md](docs/codeflash-agent-dogfooding.md#methodology) ## Workspace convention Each target organization gets its own `_org/` directory containing all repos for that org: ``` ~/Desktop/work/ ├── cf_org/ # Codeflash │ ├── codeflash-agent/ # this monorepo │ ├── codeflash/ # core engine │ ├── codeflash-internal/ # backend service │ └── ... ├── unstructured_org/ # Unstructured.io │ ├── unstructured/ # open source library │ ├── core-product/ # main product │ ├── unstructured-inference/ # ML inference │ └── ... ├── microsoft_org/ # Microsoft │ └── typeagent/ # typeagent-py (Structured RAG) ├── roboflow_org/ # Roboflow │ └── supervision/ └── _org/ # new target org └── / ``` When starting work on a new org: create `_org/`, clone all relevant repos under it, and keep non-repo files out of the org directory. ## Repo structure ``` packages/ codeflash-core/ # shared foundation (models, AI client, telemetry, git) codeflash-python/ # Python language CLI — extends core codeflash-mcp/ # MCP server (stub) codeflash-lsp/ # LSP server (stub) services/ github-app/ # GitHub App integration (FastAPI) plugin/ # Claude Code plugin (self-contained, multi-language) languages/python/ # Python domain agents, skills, references languages/javascript/ # JavaScript domain agents, skills, references .codeflash/ # active optimization data (teammember/org/project) krrt7/textualize/rich/ # 2x Rich import speedup krrt7/python/pip/ # 7x pip --version, 1.81x resolver krrt7/microsoft/typeagent/ # Structured RAG optimization /// # new optimization targets case-studies/ # summaries built from .codeflash/ scripts/ # scaffold scripts docs/ # internal guides evals/ # eval templates & real-repo scenarios ``` ## Adding an optimization target When you optimize a new project, scaffold it in `.codeflash/` and build summaries into `case-studies/`. ### 1. Set up local workspace Each org gets a `_org/` directory under `work/`. Clone from your fork, add the upstream remote: ```bash mkdir -p ~/Desktop/work/_org git clone https://github.com/KRRT7/.git ~/Desktop/work/_org/ cd ~/Desktop/work/_org/ git remote add upstream https://github.com//.git ``` ### 2. Scaffold the project ```bash # Single project: make bootstrap ORG=roboflow PROJECTS=supervision # Multiple projects under one org: make bootstrap ORG=unstructured PROJECTS="unstructured unstructured-inference core-product" ``` This creates: ``` .codeflash//// ├── README.md # results, what changed, methodology (from template) ├── bench/ # add your benchmark scripts here ├── data/ # save raw benchmark data here └── infra/ ├── cloud-init.yaml # VM provisioning (fill in remaining placeholders) └── vm-manage.sh # VM lifecycle: create, start, stop, ssh, bench, destroy ``` ### 3. Fill in the placeholders The scaffold substitutes `` automatically. You still need to fill in: | Placeholder | Where | What to fill in | |---|---|---| | `` | `infra/cloud-init.yaml` | Your fork's clone URL | | `` | `infra/cloud-init.yaml` | Toolchain install + build (language-specific) | | `` | `infra/cloud-init.yaml` | The command to benchmark | | `` | `infra/cloud-init.yaml` | Smoke test after setup | The cloud-init template includes examples for Python, Rust, Go, Node.js, and Java. ### VM lifecycle Each project gets a `vm-manage.sh` for the benchmark VM: ```bash cd .codeflash/// bash infra/vm-manage.sh create # provision VM with cloud-init bash infra/vm-manage.sh bench main # run benchmarks on a branch bash infra/vm-manage.sh ssh # SSH into VM bash infra/vm-manage.sh stop # deallocate (stops billing) bash infra/vm-manage.sh destroy # delete everything ``` ### Examples Use the existing projects as templates: - [Rich](.codeflash/krrt7/textualize/rich/) — focused scope, 2 PRs, import + runtime micro-opts - [pip](.codeflash/krrt7/python/pip/) — large scope, 122 commits across 8 categories