Restore the old InjectPerfOnly behavior where call-site identifiers are the source line number of the instrumented statement. Also fix the sync integration test to properly apply the decorator and write the helper file, and remove dead imports from test_instrumentation. |
||
|---|---|---|
| .claude | ||
| .codeflash | ||
| .codex | ||
| .gemini | ||
| .github | ||
| .tessl | ||
| .vscode | ||
| case-studies | ||
| docs | ||
| evals | ||
| packages | ||
| plugin | ||
| reports | ||
| scripts | ||
| .gitignore | ||
| .mcp.json | ||
| .pre-commit-config.yaml | ||
| CLAUDE.md | ||
| DEVELOPMENT.md | ||
| LICENSE | ||
| Makefile | ||
| pyproject.toml | ||
| README.md | ||
| ROADMAP.md | ||
| tessl.json | ||
| uv.lock | ||
codeflash-agent
Autonomous performance optimization platform. Profiles code, implements optimizations, benchmarks before and after, and iterates until plateau.
What it's achieved on real projects:
| Project | Result | Details |
|---|---|---|
| Rich | 2x Console import (79ms → 34ms) | summary |
| pip | 7x --version (138ms → 20ms), 1.81x resolver |
summary |
| typeagent-py | 2.6x query path, 1.16x import + indexing | summary |
| core-product | 14.6% latency, 2.1 GB memory savings | summary |
| metaflow | 7-18x artifact compression (lz4 vs gzip) | summary |
Domains
| Domain | When to use |
|---|---|
| CPU | CPU time, O(n²) loops, wrong containers, algorithmic complexity |
| Memory | Peak memory, OOM, memory leaks, RSS reduction |
| Async | Concurrency, event loop blocking, sequential awaits, throughput/latency |
| Structure | Import time, circular deps, module reorganization for performance |
| Deep | Cross-domain optimization — profiles all domains and iterates until plateau |
The agent auto-detects which domain(s) apply based on your request.
Install
Build the plugin first, then launch Claude with it:
git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
make build-plugin # assembles plugin into dist/ — must run before launching
claude --dangerously-skip-permissions --effort max --plugin-dir ./dist/
Your first optimization
Just run:
> /codeflash-optimize start
If you know where the problem lies, describe it in natural language instead:
> Our /process endpoint takes 5s but individual calls should only take 500ms each
> process_records is too slow, it's doing O(n²) lookups
Other commands:
> /codeflash-optimize scan # quick cross-domain diagnosis (no changes)
> /codeflash-optimize status # check progress
> /codeflash-optimize resume # continue from where you left off
> /codeflash-optimize review # review current changes or a PR
Codeflash will profile, analyze, implement fixes one at a time, re-profile after each, and stop when gains plateau. Session state persists in HANDOFF.md and results.tsv so you can resume across conversations.
For contributors
Dev setup
git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
uv sync # install all packages + dev deps
prek run --all-files # lint: ruff check, ruff format, interrogate, mypy
uv run pytest packages/ -v # test all packages
Plugin development
make build-plugin # assemble plugin → dist/ (base + python overlay + vendor)
make clean # remove dist/
The plugin is self-contained under plugin/:
plugin/— language-agnostic agents, hooks, shared referencesplugin/languages/python/— Python domain agents, skills, referencesplugin/languages/javascript/— JavaScript domain agents, skills, referencesmake build-pluginassembles base + language overlay intodist/(default:LANG=python)
Optimization patterns
Distilled from 122 pip commits + 2 Rich PRs. Ordered by typical impact.
| Tier | Category | Examples | Typical impact |
|---|---|---|---|
| 1 | Startup / Import | Fast-path early exit, import deferral, TYPE_CHECKING guards, dead import removal |
2-100x for startup paths |
| 2 | Architecture | @dataclass → __slots__, lazy loading, speculative prefetch, conditional rebuild, caching |
10-60% on hot paths |
| 3 | Micro | Identity shortcuts (is before ==), bypass public API internally, hoist to module level, __slots__ on hot classes |
1.1-1.8x per call |
| 4 | I/O | Replace slow serializers, connection pooling, parallel I/O | 2-5x for I/O-bound ops |
Anti-patterns to avoid: caching with low hit rate, premature __slots__, over-deferring imports in one-time paths, optimizing cold paths.
Full pattern catalog with examples: docs/codeflash-agent-dogfooding.md
Methodology
Profiling toolkit
| Tool | Purpose | When to use |
|---|---|---|
python -X importtime |
Import cost breakdown | First step for any CLI tool |
hyperfine |
E2E command timing with statistics | Before/after validation |
cProfile / py-spy |
Function-level CPU profiling | Finding hot functions |
timeit |
Micro-benchmarks for specific functions | Validating micro-opts |
memray / tracemalloc |
Memory profiling | Allocation-heavy paths |
objgraph |
Object count tracking | Finding redundant allocations |
Workflow
1. Profile → identify top-N bottlenecks
2. For each bottleneck:
a. Read the actual code (don't guess from profiler shapes)
b. Implement the smallest change that addresses it
c. Micro-benchmark before/after
d. Run full test suite
e. E2E benchmark
3. Commit with clear perf: prefix and numbers
4. Repeat until plateau
Environment requirements
- Non-burstable VM (e.g., Azure Standard_D2s_v5) for consistent CPU
- Multiple Python versions (3.12, 3.13 minimum — behavior differs)
hyperfine --warmup 5 --min-runs 30for statistical rigor- All tests passing before AND after every change
Full methodology details: docs/codeflash-agent-dogfooding.md
Workspace convention
Each target organization gets its own <org>_org/ directory containing all repos for that org:
~/Desktop/work/
├── cf_org/ # Codeflash
│ ├── codeflash-agent/ # this monorepo
│ ├── codeflash/ # core engine
│ ├── codeflash-internal/ # backend service
│ └── ...
├── unstructured_org/ # Unstructured.io
│ ├── unstructured/ # open source library
│ ├── core-product/ # main product
│ ├── unstructured-inference/ # ML inference
│ └── ...
├── microsoft_org/ # Microsoft
│ └── typeagent/ # typeagent-py (Structured RAG)
├── roboflow_org/ # Roboflow
│ └── supervision/
└── <org>_org/ # new target org
└── <repo>/
When starting work on a new org: create <org>_org/, clone all relevant repos under it, and keep non-repo files out of the org directory.
Repo structure
packages/
codeflash-core/ # shared foundation (models, AI client, telemetry, git)
codeflash-python/ # Python language CLI — extends core
codeflash-mcp/ # MCP server (stub)
codeflash-lsp/ # LSP server (stub)
services/
github-app/ # GitHub App integration (FastAPI)
plugin/ # Claude Code plugin (self-contained, multi-language)
languages/python/ # Python domain agents, skills, references
languages/javascript/ # JavaScript domain agents, skills, references
.codeflash/ # active optimization data (teammember/org/project)
krrt7/textualize/rich/ # 2x Rich import speedup
krrt7/python/pip/ # 7x pip --version, 1.81x resolver
krrt7/microsoft/typeagent/ # Structured RAG optimization
<member>/<org>/<project>/ # new optimization targets
case-studies/ # summaries built from .codeflash/
scripts/ # scaffold scripts
docs/ # internal guides
evals/ # eval templates & real-repo scenarios
Adding an optimization target
When you optimize a new project, scaffold it in .codeflash/ and build summaries into case-studies/.
1. Set up local workspace
Each org gets a <org>_org/ directory under work/. Clone from your fork, add the upstream remote:
mkdir -p ~/Desktop/work/<org>_org
git clone https://github.com/KRRT7/<repo>.git ~/Desktop/work/<org>_org/<project>
cd ~/Desktop/work/<org>_org/<project>
git remote add upstream https://github.com/<org>/<repo>.git
2. Scaffold the project
# Single project:
make bootstrap ORG=roboflow PROJECTS=supervision
# Multiple projects under one org:
make bootstrap ORG=unstructured PROJECTS="unstructured unstructured-inference core-product"
This creates:
.codeflash/<member>/<org>/<project>/
├── README.md # results, what changed, methodology (from template)
├── bench/ # add your benchmark scripts here
├── data/ # save raw benchmark data here
└── infra/
├── cloud-init.yaml # VM provisioning (fill in remaining placeholders)
└── vm-manage.sh # VM lifecycle: create, start, stop, ssh, bench, destroy
3. Fill in the placeholders
The scaffold substitutes <PROJECT> automatically. You still need to fill in:
| Placeholder | Where | What to fill in |
|---|---|---|
<REPO_URL> |
infra/cloud-init.yaml |
Your fork's clone URL |
<SETUP_COMMANDS> |
infra/cloud-init.yaml |
Toolchain install + build (language-specific) |
<BENCH_COMMAND> |
infra/cloud-init.yaml |
The command to benchmark |
<VERIFY_COMMAND> |
infra/cloud-init.yaml |
Smoke test after setup |
The cloud-init template includes examples for Python, Rust, Go, Node.js, and Java.
VM lifecycle
Each project gets a vm-manage.sh for the benchmark VM:
cd .codeflash/<member>/<org>/<project>
bash infra/vm-manage.sh create # provision VM with cloud-init
bash infra/vm-manage.sh bench main # run benchmarks on a branch
bash infra/vm-manage.sh ssh # SSH into VM
bash infra/vm-manage.sh stop # deallocate (stops billing)
bash infra/vm-manage.sh destroy # delete everything
Examples
Use the existing projects as templates: