Claude Code plugin for autonomous Python runtime performance optimization
Find a file
Kevin Turcios 82f16dc9f0
Merge pull request #33 from codeflash-ai/fix/tessl-caller-permissions
fix: add permissions to tessl update caller workflow
2026-04-23 07:50:43 -05:00
.claude Add rules from session audit: error handling, testing, debugging 2026-04-21 21:06:15 -05:00
.codeflash/krrt7 Add codeflash org CI audit case study and interactive Dash report 2026-04-23 03:56:04 -05:00
.codex chore: initialize tessl with vendored tiles 2026-04-23 07:43:04 -05:00
.gemini chore: initialize tessl with vendored tiles 2026-04-23 07:43:04 -05:00
.github fix: add permissions to tessl update caller workflow 2026-04-23 07:49:56 -05:00
.tessl chore: initialize tessl with vendored tiles 2026-04-23 07:43:04 -05:00
.vscode chore: initialize tessl with vendored tiles 2026-04-23 07:43:04 -05:00
case-studies Refine engagement report and case study for executive review 2026-04-16 17:51:54 -05:00
docs squash 2026-04-13 14:12:17 -05:00
evals Add [tool.codeflash] config to layered eval template 2026-04-21 06:41:31 -05:00
packages Fix unawaited coroutine warning in test_default_timeout_is_600 2026-04-23 04:46:32 -05:00
plugin Add set -e to vendored session-start-env.sh 2026-04-21 05:32:30 -05:00
reports Add Plotly Cloud deployment config for CI audit report 2026-04-23 03:59:35 -05:00
scripts Lint and format entire repo, not just packages (#23) 2026-04-15 03:16:15 -05:00
.gitignore chore: initialize tessl with vendored tiles 2026-04-23 07:43:04 -05:00
.mcp.json chore: initialize tessl with vendored tiles 2026-04-23 07:43:04 -05:00
.pre-commit-config.yaml Lint and format entire repo, not just packages (#23) 2026-04-15 03:16:15 -05:00
CLAUDE.md Add codeflash-ai/ci-audit to active case studies list 2026-04-23 03:52:10 -05:00
DEVELOPMENT.md squash 2026-04-13 14:12:17 -05:00
LICENSE Hello World 2026-03-24 16:14:04 -05:00
Makefile Improve deep optimizer: profiling script + failure modes + dist fix (#24) 2026-04-15 04:11:52 -05:00
pyproject.toml Add vulture to dev dependencies for dead code detection 2026-04-23 04:57:32 -05:00
README.md Add metaflow and core-product case studies, rename pypa to python (#18) 2026-04-14 23:31:49 -05:00
ROADMAP.md Add CI optimization item to roadmap 2026-04-23 05:59:08 -05:00
tessl.json chore: initialize tessl with vendored tiles 2026-04-23 07:43:04 -05:00
uv.lock Add vulture to dev dependencies for dead code detection 2026-04-23 04:57:32 -05:00

codeflash-agent

Autonomous performance optimization platform. Profiles code, implements optimizations, benchmarks before and after, and iterates until plateau.

What it's achieved on real projects:

Project Result Details
Rich 2x Console import (79ms → 34ms) summary
pip 7x --version (138ms → 20ms), 1.81x resolver summary
typeagent-py 2.6x query path, 1.16x import + indexing summary
core-product 14.6% latency, 2.1 GB memory savings summary
metaflow 7-18x artifact compression (lz4 vs gzip) summary

Domains

Domain When to use
CPU CPU time, O(n²) loops, wrong containers, algorithmic complexity
Memory Peak memory, OOM, memory leaks, RSS reduction
Async Concurrency, event loop blocking, sequential awaits, throughput/latency
Structure Import time, circular deps, module reorganization for performance
Deep Cross-domain optimization — profiles all domains and iterates until plateau

The agent auto-detects which domain(s) apply based on your request.

Install

Build the plugin first, then launch Claude with it:

git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
make build-plugin  # assembles plugin into dist/ — must run before launching
claude --dangerously-skip-permissions --effort max --plugin-dir ./dist/

Your first optimization

Just run:

> /codeflash-optimize start

If you know where the problem lies, describe it in natural language instead:

> Our /process endpoint takes 5s but individual calls should only take 500ms each
> process_records is too slow, it's doing O(n²) lookups

Other commands:

> /codeflash-optimize scan     # quick cross-domain diagnosis (no changes)
> /codeflash-optimize status   # check progress
> /codeflash-optimize resume   # continue from where you left off
> /codeflash-optimize review   # review current changes or a PR

Codeflash will profile, analyze, implement fixes one at a time, re-profile after each, and stop when gains plateau. Session state persists in HANDOFF.md and results.tsv so you can resume across conversations.

For contributors

Dev setup

git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
uv sync                          # install all packages + dev deps
prek run --all-files             # lint: ruff check, ruff format, interrogate, mypy
uv run pytest packages/ -v      # test all packages

Plugin development

make build-plugin    # assemble plugin → dist/ (base + python overlay + vendor)
make clean           # remove dist/

The plugin is self-contained under plugin/:

  • plugin/ — language-agnostic agents, hooks, shared references
  • plugin/languages/python/ — Python domain agents, skills, references
  • plugin/languages/javascript/ — JavaScript domain agents, skills, references
  • make build-plugin assembles base + language overlay into dist/ (default: LANG=python)

Optimization patterns

Distilled from 122 pip commits + 2 Rich PRs. Ordered by typical impact.

Tier Category Examples Typical impact
1 Startup / Import Fast-path early exit, import deferral, TYPE_CHECKING guards, dead import removal 2-100x for startup paths
2 Architecture @dataclass__slots__, lazy loading, speculative prefetch, conditional rebuild, caching 10-60% on hot paths
3 Micro Identity shortcuts (is before ==), bypass public API internally, hoist to module level, __slots__ on hot classes 1.1-1.8x per call
4 I/O Replace slow serializers, connection pooling, parallel I/O 2-5x for I/O-bound ops

Anti-patterns to avoid: caching with low hit rate, premature __slots__, over-deferring imports in one-time paths, optimizing cold paths.

Full pattern catalog with examples: docs/codeflash-agent-dogfooding.md

Methodology

Profiling toolkit

Tool Purpose When to use
python -X importtime Import cost breakdown First step for any CLI tool
hyperfine E2E command timing with statistics Before/after validation
cProfile / py-spy Function-level CPU profiling Finding hot functions
timeit Micro-benchmarks for specific functions Validating micro-opts
memray / tracemalloc Memory profiling Allocation-heavy paths
objgraph Object count tracking Finding redundant allocations

Workflow

1. Profile → identify top-N bottlenecks
2. For each bottleneck:
   a. Read the actual code (don't guess from profiler shapes)
   b. Implement the smallest change that addresses it
   c. Micro-benchmark before/after
   d. Run full test suite
   e. E2E benchmark
3. Commit with clear perf: prefix and numbers
4. Repeat until plateau

Environment requirements

  • Non-burstable VM (e.g., Azure Standard_D2s_v5) for consistent CPU
  • Multiple Python versions (3.12, 3.13 minimum — behavior differs)
  • hyperfine --warmup 5 --min-runs 30 for statistical rigor
  • All tests passing before AND after every change

Full methodology details: docs/codeflash-agent-dogfooding.md

Workspace convention

Each target organization gets its own <org>_org/ directory containing all repos for that org:

~/Desktop/work/
├── cf_org/                    # Codeflash
│   ├── codeflash-agent/       # this monorepo
│   ├── codeflash/             # core engine
│   ├── codeflash-internal/    # backend service
│   └── ...
├── unstructured_org/          # Unstructured.io
│   ├── unstructured/          # open source library
│   ├── core-product/          # main product
│   ├── unstructured-inference/ # ML inference
│   └── ...
├── microsoft_org/             # Microsoft
│   └── typeagent/             # typeagent-py (Structured RAG)
├── roboflow_org/              # Roboflow
│   └── supervision/
└── <org>_org/                 # new target org
    └── <repo>/

When starting work on a new org: create <org>_org/, clone all relevant repos under it, and keep non-repo files out of the org directory.

Repo structure

packages/
  codeflash-core/              # shared foundation (models, AI client, telemetry, git)
  codeflash-python/            # Python language CLI — extends core
  codeflash-mcp/               # MCP server (stub)
  codeflash-lsp/               # LSP server (stub)

services/
  github-app/                  # GitHub App integration (FastAPI)

plugin/                        # Claude Code plugin (self-contained, multi-language)
  languages/python/            # Python domain agents, skills, references
  languages/javascript/        # JavaScript domain agents, skills, references

.codeflash/                  # active optimization data (teammember/org/project)
  krrt7/textualize/rich/       # 2x Rich import speedup
  krrt7/python/pip/            # 7x pip --version, 1.81x resolver
  krrt7/microsoft/typeagent/   # Structured RAG optimization
  <member>/<org>/<project>/    # new optimization targets

case-studies/                  # summaries built from .codeflash/
scripts/                       # scaffold scripts
docs/                          # internal guides
evals/                         # eval templates & real-repo scenarios

Adding an optimization target

When you optimize a new project, scaffold it in .codeflash/ and build summaries into case-studies/.

1. Set up local workspace

Each org gets a <org>_org/ directory under work/. Clone from your fork, add the upstream remote:

mkdir -p ~/Desktop/work/<org>_org
git clone https://github.com/KRRT7/<repo>.git ~/Desktop/work/<org>_org/<project>
cd ~/Desktop/work/<org>_org/<project>
git remote add upstream https://github.com/<org>/<repo>.git

2. Scaffold the project

# Single project:
make bootstrap ORG=roboflow PROJECTS=supervision

# Multiple projects under one org:
make bootstrap ORG=unstructured PROJECTS="unstructured unstructured-inference core-product"

This creates:

.codeflash/<member>/<org>/<project>/
├── README.md              # results, what changed, methodology (from template)
├── bench/                 # add your benchmark scripts here
├── data/                  # save raw benchmark data here
└── infra/
    ├── cloud-init.yaml    # VM provisioning (fill in remaining placeholders)
    └── vm-manage.sh       # VM lifecycle: create, start, stop, ssh, bench, destroy

3. Fill in the placeholders

The scaffold substitutes <PROJECT> automatically. You still need to fill in:

Placeholder Where What to fill in
<REPO_URL> infra/cloud-init.yaml Your fork's clone URL
<SETUP_COMMANDS> infra/cloud-init.yaml Toolchain install + build (language-specific)
<BENCH_COMMAND> infra/cloud-init.yaml The command to benchmark
<VERIFY_COMMAND> infra/cloud-init.yaml Smoke test after setup

The cloud-init template includes examples for Python, Rust, Go, Node.js, and Java.

VM lifecycle

Each project gets a vm-manage.sh for the benchmark VM:

cd .codeflash/<member>/<org>/<project>
bash infra/vm-manage.sh create    # provision VM with cloud-init
bash infra/vm-manage.sh bench main  # run benchmarks on a branch
bash infra/vm-manage.sh ssh       # SSH into VM
bash infra/vm-manage.sh stop      # deallocate (stops billing)
bash infra/vm-manage.sh destroy   # delete everything

Examples

Use the existing projects as templates:

  • Rich — focused scope, 2 PRs, import + runtime micro-opts
  • pip — large scope, 122 commits across 8 categories