Claude Code plugin for autonomous Python runtime performance optimization

Find a file

Kevin Turcios c3e7dba47b Add report screenshots to reports/unstructured/screenshots/		2026-04-16 03:48:14 -05:00
.claude	Add Unstructured report, rewrite statusline, format evals/scripts (#20 )	2026-04-15 03:06:16 -05:00
.codeflash/krrt7	Lint and format entire repo, not just packages (#23 )	2026-04-15 03:16:15 -05:00
.github	Fix CI: mypy errors, ruff formatting, switch to prek (#22 )	2026-04-15 02:52:47 -05:00
case-studies	Add metaflow and core-product case studies, rename pypa to python (#18 )	2026-04-14 23:31:49 -05:00
docs	squash	2026-04-13 14:12:17 -05:00
evals	Lint and format entire repo, not just packages (#23 )	2026-04-15 03:16:15 -05:00
packages	Fix CI: mypy errors, ruff formatting, switch to prek (#22 )	2026-04-15 02:52:47 -05:00
plugin	Improve deep optimizer: profiling script + failure modes + dist fix (#24 )	2026-04-15 04:11:52 -05:00
reports/unstructured	Add report screenshots to reports/unstructured/screenshots/	2026-04-16 03:48:14 -05:00
scripts	Lint and format entire repo, not just packages (#23 )	2026-04-15 03:16:15 -05:00
.gitignore	Add Unstructured report, rewrite statusline, format evals/scripts (#20 )	2026-04-15 03:06:16 -05:00
.pre-commit-config.yaml	Lint and format entire repo, not just packages (#23 )	2026-04-15 03:16:15 -05:00
CLAUDE.md	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 )	2026-04-14 23:04:34 -05:00
DEVELOPMENT.md	squash	2026-04-13 14:12:17 -05:00
LICENSE	Hello World	2026-03-24 16:14:04 -05:00
Makefile	Improve deep optimizer: profiling script + failure modes + dist fix (#24 )	2026-04-15 04:11:52 -05:00
pyproject.toml	Update Unstructured engagement report (#25 )	2026-04-15 13:11:28 -05:00
README.md	Add metaflow and core-product case studies, rename pypa to python (#18 )	2026-04-14 23:31:49 -05:00
uv.lock	Update Unstructured engagement report (#25 )	2026-04-15 13:11:28 -05:00

README.md

codeflash-agent

Autonomous performance optimization platform. Profiles code, implements optimizations, benchmarks before and after, and iterates until plateau.

What it's achieved on real projects:

Project	Result	Details
Rich	2x Console import (79ms → 34ms)	summary
pip	7x `--version` (138ms → 20ms), 1.81x resolver	summary
typeagent-py	2.6x query path, 1.16x import + indexing	summary
core-product	14.6% latency, 2.1 GB memory savings	summary
metaflow	7-18x artifact compression (lz4 vs gzip)	summary

Domains

Domain	When to use
CPU	CPU time, O(n²) loops, wrong containers, algorithmic complexity
Memory	Peak memory, OOM, memory leaks, RSS reduction
Async	Concurrency, event loop blocking, sequential awaits, throughput/latency
Structure	Import time, circular deps, module reorganization for performance
Deep	Cross-domain optimization — profiles all domains and iterates until plateau

The agent auto-detects which domain(s) apply based on your request.

Install

Build the plugin first, then launch Claude with it:

git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
make build-plugin  # assembles plugin into dist/ — must run before launching
claude --dangerously-skip-permissions --effort max --plugin-dir ./dist/

Your first optimization

Just run:

> /codeflash-optimize start

If you know where the problem lies, describe it in natural language instead:

> Our /process endpoint takes 5s but individual calls should only take 500ms each
> process_records is too slow, it's doing O(n²) lookups

Other commands:

> /codeflash-optimize scan     # quick cross-domain diagnosis (no changes)
> /codeflash-optimize status   # check progress
> /codeflash-optimize resume   # continue from where you left off
> /codeflash-optimize review   # review current changes or a PR

Codeflash will profile, analyze, implement fixes one at a time, re-profile after each, and stop when gains plateau. Session state persists in HANDOFF.md and results.tsv so you can resume across conversations.

For contributors

Dev setup

git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
uv sync                          # install all packages + dev deps
prek run --all-files             # lint: ruff check, ruff format, interrogate, mypy
uv run pytest packages/ -v      # test all packages

Plugin development

make build-plugin    # assemble plugin → dist/ (base + python overlay + vendor)
make clean           # remove dist/

The plugin is self-contained under plugin/:

plugin/ — language-agnostic agents, hooks, shared references
plugin/languages/python/ — Python domain agents, skills, references
plugin/languages/javascript/ — JavaScript domain agents, skills, references
make build-plugin assembles base + language overlay into dist/ (default: LANG=python)

Optimization patterns

Distilled from 122 pip commits + 2 Rich PRs. Ordered by typical impact.

Tier	Category	Examples	Typical impact
1	Startup / Import	Fast-path early exit, import deferral, `TYPE_CHECKING` guards, dead import removal	2-100x for startup paths
2	Architecture	`@dataclass` → `__slots__`, lazy loading, speculative prefetch, conditional rebuild, caching	10-60% on hot paths
3	Micro	Identity shortcuts (`is` before `==`), bypass public API internally, hoist to module level, `__slots__` on hot classes	1.1-1.8x per call
4	I/O	Replace slow serializers, connection pooling, parallel I/O	2-5x for I/O-bound ops

Anti-patterns to avoid: caching with low hit rate, premature __slots__, over-deferring imports in one-time paths, optimizing cold paths.

Full pattern catalog with examples: docs/codeflash-agent-dogfooding.md

Methodology

Profiling toolkit

Tool	Purpose	When to use
`python -X importtime`	Import cost breakdown	First step for any CLI tool
`hyperfine`	E2E command timing with statistics	Before/after validation
`cProfile` / `py-spy`	Function-level CPU profiling	Finding hot functions
`timeit`	Micro-benchmarks for specific functions	Validating micro-opts
`memray` / `tracemalloc`	Memory profiling	Allocation-heavy paths
`objgraph`	Object count tracking	Finding redundant allocations

Workflow

1. Profile → identify top-N bottlenecks
2. For each bottleneck:
   a. Read the actual code (don't guess from profiler shapes)
   b. Implement the smallest change that addresses it
   c. Micro-benchmark before/after
   d. Run full test suite
   e. E2E benchmark
3. Commit with clear perf: prefix and numbers
4. Repeat until plateau

Environment requirements

Non-burstable VM (e.g., Azure Standard_D2s_v5) for consistent CPU
Multiple Python versions (3.12, 3.13 minimum — behavior differs)
hyperfine --warmup 5 --min-runs 30 for statistical rigor
All tests passing before AND after every change

Full methodology details: docs/codeflash-agent-dogfooding.md

Workspace convention

Each target organization gets its own <org>_org/ directory containing all repos for that org:

~/Desktop/work/
├── cf_org/                    # Codeflash
│   ├── codeflash-agent/       # this monorepo
│   ├── codeflash/             # core engine
│   ├── codeflash-internal/    # backend service
│   └── ...
├── unstructured_org/          # Unstructured.io
│   ├── unstructured/          # open source library
│   ├── core-product/          # main product
│   ├── unstructured-inference/ # ML inference
│   └── ...
├── microsoft_org/             # Microsoft
│   └── typeagent/             # typeagent-py (Structured RAG)
├── roboflow_org/              # Roboflow
│   └── supervision/
└── <org>_org/                 # new target org
    └── <repo>/

When starting work on a new org: create <org>_org/, clone all relevant repos under it, and keep non-repo files out of the org directory.

Repo structure

packages/
  codeflash-core/              # shared foundation (models, AI client, telemetry, git)
  codeflash-python/            # Python language CLI — extends core
  codeflash-mcp/               # MCP server (stub)
  codeflash-lsp/               # LSP server (stub)

services/
  github-app/                  # GitHub App integration (FastAPI)

plugin/                        # Claude Code plugin (self-contained, multi-language)
  languages/python/            # Python domain agents, skills, references
  languages/javascript/        # JavaScript domain agents, skills, references

.codeflash/                  # active optimization data (teammember/org/project)
  krrt7/textualize/rich/       # 2x Rich import speedup
  krrt7/python/pip/            # 7x pip --version, 1.81x resolver
  krrt7/microsoft/typeagent/   # Structured RAG optimization
  <member>/<org>/<project>/    # new optimization targets

case-studies/                  # summaries built from .codeflash/
scripts/                       # scaffold scripts
docs/                          # internal guides
evals/                         # eval templates & real-repo scenarios

Adding an optimization target

When you optimize a new project, scaffold it in .codeflash/ and build summaries into case-studies/.

1. Set up local workspace

Each org gets a <org>_org/ directory under work/. Clone from your fork, add the upstream remote:

mkdir -p ~/Desktop/work/<org>_org
git clone https://github.com/KRRT7/<repo>.git ~/Desktop/work/<org>_org/<project>
cd ~/Desktop/work/<org>_org/<project>
git remote add upstream https://github.com/<org>/<repo>.git

2. Scaffold the project

# Single project:
make bootstrap ORG=roboflow PROJECTS=supervision

# Multiple projects under one org:
make bootstrap ORG=unstructured PROJECTS="unstructured unstructured-inference core-product"

This creates:

.codeflash/<member>/<org>/<project>/
├── README.md              # results, what changed, methodology (from template)
├── bench/                 # add your benchmark scripts here
├── data/                  # save raw benchmark data here
└── infra/
    ├── cloud-init.yaml    # VM provisioning (fill in remaining placeholders)
    └── vm-manage.sh       # VM lifecycle: create, start, stop, ssh, bench, destroy

3. Fill in the placeholders

The scaffold substitutes <PROJECT> automatically. You still need to fill in:

Placeholder	Where	What to fill in
`<REPO_URL>`	`infra/cloud-init.yaml`	Your fork's clone URL
`<SETUP_COMMANDS>`	`infra/cloud-init.yaml`	Toolchain install + build (language-specific)
`<BENCH_COMMAND>`	`infra/cloud-init.yaml`	The command to benchmark
`<VERIFY_COMMAND>`	`infra/cloud-init.yaml`	Smoke test after setup

The cloud-init template includes examples for Python, Rust, Go, Node.js, and Java.

VM lifecycle

Each project gets a vm-manage.sh for the benchmark VM:

cd .codeflash/<member>/<org>/<project>
bash infra/vm-manage.sh create    # provision VM with cloud-init
bash infra/vm-manage.sh bench main  # run benchmarks on a branch
bash infra/vm-manage.sh ssh       # SSH into VM
bash infra/vm-manage.sh stop      # deallocate (stops billing)
bash infra/vm-manage.sh destroy   # delete everything

Examples

Use the existing projects as templates:

Rich — focused scope, 2 PRs, import + runtime micro-opts
pip — large scope, 122 commits across 8 categories