| .claude | ||
| .codeflash/krrt7 | ||
| .github | ||
| case-studies | ||
| docs | ||
| evals | ||
| packages | ||
| plugin | ||
| reports/unstructured | ||
| scripts | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| CLAUDE.md | ||
| DEVELOPMENT.md | ||
| LICENSE | ||
| Makefile | ||
| pyproject.toml | ||
| README.md | ||
| uv.lock | ||
codeflash-agent
Autonomous performance optimization platform. Profiles code, implements optimizations, benchmarks before and after, and iterates until plateau.
What it's achieved on real projects:
| Project | Result | Details |
|---|---|---|
| Rich | 2x Console import (79ms → 34ms) | summary |
| pip | 7x --version (138ms → 20ms), 1.81x resolver |
summary |
| typeagent-py | 2.6x query path, 1.16x import + indexing | summary |
| core-product | 14.6% latency, 2.1 GB memory savings | summary |
| metaflow | 7-18x artifact compression (lz4 vs gzip) | summary |
Domains
| Domain | When to use |
|---|---|
| CPU | CPU time, O(n²) loops, wrong containers, algorithmic complexity |
| Memory | Peak memory, OOM, memory leaks, RSS reduction |
| Async | Concurrency, event loop blocking, sequential awaits, throughput/latency |
| Structure | Import time, circular deps, module reorganization for performance |
| Deep | Cross-domain optimization — profiles all domains and iterates until plateau |
The agent auto-detects which domain(s) apply based on your request.
Install
Build the plugin first, then launch Claude with it:
git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
make build-plugin # assembles plugin into dist/ — must run before launching
claude --dangerously-skip-permissions --effort max --plugin-dir ./dist/
Your first optimization
Just run:
> /codeflash-optimize start
If you know where the problem lies, describe it in natural language instead:
> Our /process endpoint takes 5s but individual calls should only take 500ms each
> process_records is too slow, it's doing O(n²) lookups
Other commands:
> /codeflash-optimize scan # quick cross-domain diagnosis (no changes)
> /codeflash-optimize status # check progress
> /codeflash-optimize resume # continue from where you left off
> /codeflash-optimize review # review current changes or a PR
Codeflash will profile, analyze, implement fixes one at a time, re-profile after each, and stop when gains plateau. Session state persists in HANDOFF.md and results.tsv so you can resume across conversations.
For contributors
Dev setup
git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
uv sync # install all packages + dev deps
prek run --all-files # lint: ruff check, ruff format, interrogate, mypy
uv run pytest packages/ -v # test all packages
Plugin development
make build-plugin # assemble plugin → dist/ (base + python overlay + vendor)
make clean # remove dist/
The plugin is self-contained under plugin/:
plugin/— language-agnostic agents, hooks, shared referencesplugin/languages/python/— Python domain agents, skills, referencesplugin/languages/javascript/— JavaScript domain agents, skills, referencesmake build-pluginassembles base + language overlay intodist/(default:LANG=python)
Optimization patterns
Distilled from 122 pip commits + 2 Rich PRs. Ordered by typical impact.
| Tier | Category | Examples | Typical impact |
|---|---|---|---|
| 1 | Startup / Import | Fast-path early exit, import deferral, TYPE_CHECKING guards, dead import removal |
2-100x for startup paths |
| 2 | Architecture | @dataclass → __slots__, lazy loading, speculative prefetch, conditional rebuild, caching |
10-60% on hot paths |
| 3 | Micro | Identity shortcuts (is before ==), bypass public API internally, hoist to module level, __slots__ on hot classes |
1.1-1.8x per call |
| 4 | I/O | Replace slow serializers, connection pooling, parallel I/O | 2-5x for I/O-bound ops |
Anti-patterns to avoid: caching with low hit rate, premature __slots__, over-deferring imports in one-time paths, optimizing cold paths.
Full pattern catalog with examples: docs/codeflash-agent-dogfooding.md
Methodology
Profiling toolkit
| Tool | Purpose | When to use |
|---|---|---|
python -X importtime |
Import cost breakdown | First step for any CLI tool |
hyperfine |
E2E command timing with statistics | Before/after validation |
cProfile / py-spy |
Function-level CPU profiling | Finding hot functions |
timeit |
Micro-benchmarks for specific functions | Validating micro-opts |
memray / tracemalloc |
Memory profiling | Allocation-heavy paths |
objgraph |
Object count tracking | Finding redundant allocations |
Workflow
1. Profile → identify top-N bottlenecks
2. For each bottleneck:
a. Read the actual code (don't guess from profiler shapes)
b. Implement the smallest change that addresses it
c. Micro-benchmark before/after
d. Run full test suite
e. E2E benchmark
3. Commit with clear perf: prefix and numbers
4. Repeat until plateau
Environment requirements
- Non-burstable VM (e.g., Azure Standard_D2s_v5) for consistent CPU
- Multiple Python versions (3.12, 3.13 minimum — behavior differs)
hyperfine --warmup 5 --min-runs 30for statistical rigor- All tests passing before AND after every change
Full methodology details: docs/codeflash-agent-dogfooding.md
Workspace convention
Each target organization gets its own <org>_org/ directory containing all repos for that org:
~/Desktop/work/
├── cf_org/ # Codeflash
│ ├── codeflash-agent/ # this monorepo
│ ├── codeflash/ # core engine
│ ├── codeflash-internal/ # backend service
│ └── ...
├── unstructured_org/ # Unstructured.io
│ ├── unstructured/ # open source library
│ ├── core-product/ # main product
│ ├── unstructured-inference/ # ML inference
│ └── ...
├── microsoft_org/ # Microsoft
│ └── typeagent/ # typeagent-py (Structured RAG)
├── roboflow_org/ # Roboflow
│ └── supervision/
└── <org>_org/ # new target org
└── <repo>/
When starting work on a new org: create <org>_org/, clone all relevant repos under it, and keep non-repo files out of the org directory.
Repo structure
packages/
codeflash-core/ # shared foundation (models, AI client, telemetry, git)
codeflash-python/ # Python language CLI — extends core
codeflash-mcp/ # MCP server (stub)
codeflash-lsp/ # LSP server (stub)
services/
github-app/ # GitHub App integration (FastAPI)
plugin/ # Claude Code plugin (self-contained, multi-language)
languages/python/ # Python domain agents, skills, references
languages/javascript/ # JavaScript domain agents, skills, references
.codeflash/ # active optimization data (teammember/org/project)
krrt7/textualize/rich/ # 2x Rich import speedup
krrt7/python/pip/ # 7x pip --version, 1.81x resolver
krrt7/microsoft/typeagent/ # Structured RAG optimization
<member>/<org>/<project>/ # new optimization targets
case-studies/ # summaries built from .codeflash/
scripts/ # scaffold scripts
docs/ # internal guides
evals/ # eval templates & real-repo scenarios
Adding an optimization target
When you optimize a new project, scaffold it in .codeflash/ and build summaries into case-studies/.
1. Set up local workspace
Each org gets a <org>_org/ directory under work/. Clone from your fork, add the upstream remote:
mkdir -p ~/Desktop/work/<org>_org
git clone https://github.com/KRRT7/<repo>.git ~/Desktop/work/<org>_org/<project>
cd ~/Desktop/work/<org>_org/<project>
git remote add upstream https://github.com/<org>/<repo>.git
2. Scaffold the project
# Single project:
make bootstrap ORG=roboflow PROJECTS=supervision
# Multiple projects under one org:
make bootstrap ORG=unstructured PROJECTS="unstructured unstructured-inference core-product"
This creates:
.codeflash/<member>/<org>/<project>/
├── README.md # results, what changed, methodology (from template)
├── bench/ # add your benchmark scripts here
├── data/ # save raw benchmark data here
└── infra/
├── cloud-init.yaml # VM provisioning (fill in remaining placeholders)
└── vm-manage.sh # VM lifecycle: create, start, stop, ssh, bench, destroy
3. Fill in the placeholders
The scaffold substitutes <PROJECT> automatically. You still need to fill in:
| Placeholder | Where | What to fill in |
|---|---|---|
<REPO_URL> |
infra/cloud-init.yaml |
Your fork's clone URL |
<SETUP_COMMANDS> |
infra/cloud-init.yaml |
Toolchain install + build (language-specific) |
<BENCH_COMMAND> |
infra/cloud-init.yaml |
The command to benchmark |
<VERIFY_COMMAND> |
infra/cloud-init.yaml |
Smoke test after setup |
The cloud-init template includes examples for Python, Rust, Go, Node.js, and Java.
VM lifecycle
Each project gets a vm-manage.sh for the benchmark VM:
cd .codeflash/<member>/<org>/<project>
bash infra/vm-manage.sh create # provision VM with cloud-init
bash infra/vm-manage.sh bench main # run benchmarks on a branch
bash infra/vm-manage.sh ssh # SSH into VM
bash infra/vm-manage.sh stop # deallocate (stops billing)
bash infra/vm-manage.sh destroy # delete everything
Examples
Use the existing projects as templates: