Claude Code plugin for autonomous Python runtime performance optimization
Find a file
Kevin Turcios 1ff2a76152
perf(analytics): use rfind and local json.loads (#44)
* perf(analytics): use rfind and local json.loads for hot paths

Replace Path().suffix with string rfind for extension extraction,
use local json.loads binding and bytes split for JSONL parsing.

* fix: use splitlines and preserve extensionless file behavior

split("\n") mishandles \r\n line endings. The early return on
extensionless files changed behavior vs the original Path().suffix
which returned "" and fell through. Use splitlines() and let
extensionless files fall through with lang=None.

* style: use ternary for extensionless file check per SIM108

* Add blackbox benchmark VM infra

D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning,
CPU-pinned benchmarks, and A/B comparison scripts.

---------

Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>
2026-04-29 03:22:42 -05:00
.claude Add rules from session audit: error handling, testing, debugging 2026-04-21 21:06:15 -05:00
.codeflash perf(analytics): use rfind and local json.loads (#44) 2026-04-29 03:22:42 -05:00
.codex chore: initialize tessl with vendored tiles 2026-04-23 07:43:04 -05:00
.gemini chore: initialize tessl with vendored tiles 2026-04-23 07:43:04 -05:00
.github Add blackbox package: session flight recorder with HTMX dashboard (#39) 2026-04-28 19:58:43 -05:00
.tessl chore: update tessl tiles 2026-04-23 (#35) 2026-04-23 08:15:44 -05:00
.vscode chore: initialize tessl with vendored tiles 2026-04-23 07:43:04 -05:00
case-studies Refine engagement report and case study for executive review 2026-04-16 17:51:54 -05:00
docs squash 2026-04-13 14:12:17 -05:00
evals Add [tool.codeflash] config to layered eval template 2026-04-21 06:41:31 -05:00
packages perf(analytics): use rfind and local json.loads (#44) 2026-04-29 03:22:42 -05:00
plugin Add set -e to vendored session-start-env.sh 2026-04-21 05:32:30 -05:00
reports chore: add standup dashboard with CI audit integration (#36) 2026-04-23 18:52:33 -05:00
scripts Lint and format entire repo, not just packages (#23) 2026-04-15 03:16:15 -05:00
.gitignore Fix pre-existing CI lint and test failures (#40) 2026-04-28 18:39:46 -05:00
.mcp.json chore: initialize tessl with vendored tiles 2026-04-23 07:43:04 -05:00
.pre-commit-config.yaml Lint and format entire repo, not just packages (#23) 2026-04-15 03:16:15 -05:00
.python-version Fix Dependabot resolver and bump GitPython for security (#42) 2026-04-28 20:28:42 -05:00
CLAUDE.md Add codeflash-ai/ci-audit to active case studies list 2026-04-23 03:52:10 -05:00
DEVELOPMENT.md squash 2026-04-13 14:12:17 -05:00
LICENSE Hello World 2026-03-24 16:14:04 -05:00
Makefile Improve deep optimizer: profiling script + failure modes + dist fix (#24) 2026-04-15 04:11:52 -05:00
pyproject.toml Add blackbox package: session flight recorder with HTMX dashboard (#39) 2026-04-28 19:58:43 -05:00
README.md Add metaflow and core-product case studies, rename pypa to python (#18) 2026-04-14 23:31:49 -05:00
ROADMAP.md Add CI optimization item to roadmap 2026-04-23 05:59:08 -05:00
tessl.json chore: update tessl tiles 2026-04-23 (#35) 2026-04-23 08:15:44 -05:00
uv.lock Fix Dependabot resolver and bump GitPython for security (#42) 2026-04-28 20:28:42 -05:00

codeflash-agent

Autonomous performance optimization platform. Profiles code, implements optimizations, benchmarks before and after, and iterates until plateau.

What it's achieved on real projects:

Project Result Details
Rich 2x Console import (79ms → 34ms) summary
pip 7x --version (138ms → 20ms), 1.81x resolver summary
typeagent-py 2.6x query path, 1.16x import + indexing summary
core-product 14.6% latency, 2.1 GB memory savings summary
metaflow 7-18x artifact compression (lz4 vs gzip) summary

Domains

Domain When to use
CPU CPU time, O(n²) loops, wrong containers, algorithmic complexity
Memory Peak memory, OOM, memory leaks, RSS reduction
Async Concurrency, event loop blocking, sequential awaits, throughput/latency
Structure Import time, circular deps, module reorganization for performance
Deep Cross-domain optimization — profiles all domains and iterates until plateau

The agent auto-detects which domain(s) apply based on your request.

Install

Build the plugin first, then launch Claude with it:

git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
make build-plugin  # assembles plugin into dist/ — must run before launching
claude --dangerously-skip-permissions --effort max --plugin-dir ./dist/

Your first optimization

Just run:

> /codeflash-optimize start

If you know where the problem lies, describe it in natural language instead:

> Our /process endpoint takes 5s but individual calls should only take 500ms each
> process_records is too slow, it's doing O(n²) lookups

Other commands:

> /codeflash-optimize scan     # quick cross-domain diagnosis (no changes)
> /codeflash-optimize status   # check progress
> /codeflash-optimize resume   # continue from where you left off
> /codeflash-optimize review   # review current changes or a PR

Codeflash will profile, analyze, implement fixes one at a time, re-profile after each, and stop when gains plateau. Session state persists in HANDOFF.md and results.tsv so you can resume across conversations.

For contributors

Dev setup

git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
uv sync                          # install all packages + dev deps
prek run --all-files             # lint: ruff check, ruff format, interrogate, mypy
uv run pytest packages/ -v      # test all packages

Plugin development

make build-plugin    # assemble plugin → dist/ (base + python overlay + vendor)
make clean           # remove dist/

The plugin is self-contained under plugin/:

  • plugin/ — language-agnostic agents, hooks, shared references
  • plugin/languages/python/ — Python domain agents, skills, references
  • plugin/languages/javascript/ — JavaScript domain agents, skills, references
  • make build-plugin assembles base + language overlay into dist/ (default: LANG=python)

Optimization patterns

Distilled from 122 pip commits + 2 Rich PRs. Ordered by typical impact.

Tier Category Examples Typical impact
1 Startup / Import Fast-path early exit, import deferral, TYPE_CHECKING guards, dead import removal 2-100x for startup paths
2 Architecture @dataclass__slots__, lazy loading, speculative prefetch, conditional rebuild, caching 10-60% on hot paths
3 Micro Identity shortcuts (is before ==), bypass public API internally, hoist to module level, __slots__ on hot classes 1.1-1.8x per call
4 I/O Replace slow serializers, connection pooling, parallel I/O 2-5x for I/O-bound ops

Anti-patterns to avoid: caching with low hit rate, premature __slots__, over-deferring imports in one-time paths, optimizing cold paths.

Full pattern catalog with examples: docs/codeflash-agent-dogfooding.md

Methodology

Profiling toolkit

Tool Purpose When to use
python -X importtime Import cost breakdown First step for any CLI tool
hyperfine E2E command timing with statistics Before/after validation
cProfile / py-spy Function-level CPU profiling Finding hot functions
timeit Micro-benchmarks for specific functions Validating micro-opts
memray / tracemalloc Memory profiling Allocation-heavy paths
objgraph Object count tracking Finding redundant allocations

Workflow

1. Profile → identify top-N bottlenecks
2. For each bottleneck:
   a. Read the actual code (don't guess from profiler shapes)
   b. Implement the smallest change that addresses it
   c. Micro-benchmark before/after
   d. Run full test suite
   e. E2E benchmark
3. Commit with clear perf: prefix and numbers
4. Repeat until plateau

Environment requirements

  • Non-burstable VM (e.g., Azure Standard_D2s_v5) for consistent CPU
  • Multiple Python versions (3.12, 3.13 minimum — behavior differs)
  • hyperfine --warmup 5 --min-runs 30 for statistical rigor
  • All tests passing before AND after every change

Full methodology details: docs/codeflash-agent-dogfooding.md

Workspace convention

Each target organization gets its own <org>_org/ directory containing all repos for that org:

~/Desktop/work/
├── cf_org/                    # Codeflash
│   ├── codeflash-agent/       # this monorepo
│   ├── codeflash/             # core engine
│   ├── codeflash-internal/    # backend service
│   └── ...
├── unstructured_org/          # Unstructured.io
│   ├── unstructured/          # open source library
│   ├── core-product/          # main product
│   ├── unstructured-inference/ # ML inference
│   └── ...
├── microsoft_org/             # Microsoft
│   └── typeagent/             # typeagent-py (Structured RAG)
├── roboflow_org/              # Roboflow
│   └── supervision/
└── <org>_org/                 # new target org
    └── <repo>/

When starting work on a new org: create <org>_org/, clone all relevant repos under it, and keep non-repo files out of the org directory.

Repo structure

packages/
  codeflash-core/              # shared foundation (models, AI client, telemetry, git)
  codeflash-python/            # Python language CLI — extends core
  codeflash-mcp/               # MCP server (stub)
  codeflash-lsp/               # LSP server (stub)

services/
  github-app/                  # GitHub App integration (FastAPI)

plugin/                        # Claude Code plugin (self-contained, multi-language)
  languages/python/            # Python domain agents, skills, references
  languages/javascript/        # JavaScript domain agents, skills, references

.codeflash/                  # active optimization data (teammember/org/project)
  krrt7/textualize/rich/       # 2x Rich import speedup
  krrt7/python/pip/            # 7x pip --version, 1.81x resolver
  krrt7/microsoft/typeagent/   # Structured RAG optimization
  <member>/<org>/<project>/    # new optimization targets

case-studies/                  # summaries built from .codeflash/
scripts/                       # scaffold scripts
docs/                          # internal guides
evals/                         # eval templates & real-repo scenarios

Adding an optimization target

When you optimize a new project, scaffold it in .codeflash/ and build summaries into case-studies/.

1. Set up local workspace

Each org gets a <org>_org/ directory under work/. Clone from your fork, add the upstream remote:

mkdir -p ~/Desktop/work/<org>_org
git clone https://github.com/KRRT7/<repo>.git ~/Desktop/work/<org>_org/<project>
cd ~/Desktop/work/<org>_org/<project>
git remote add upstream https://github.com/<org>/<repo>.git

2. Scaffold the project

# Single project:
make bootstrap ORG=roboflow PROJECTS=supervision

# Multiple projects under one org:
make bootstrap ORG=unstructured PROJECTS="unstructured unstructured-inference core-product"

This creates:

.codeflash/<member>/<org>/<project>/
├── README.md              # results, what changed, methodology (from template)
├── bench/                 # add your benchmark scripts here
├── data/                  # save raw benchmark data here
└── infra/
    ├── cloud-init.yaml    # VM provisioning (fill in remaining placeholders)
    └── vm-manage.sh       # VM lifecycle: create, start, stop, ssh, bench, destroy

3. Fill in the placeholders

The scaffold substitutes <PROJECT> automatically. You still need to fill in:

Placeholder Where What to fill in
<REPO_URL> infra/cloud-init.yaml Your fork's clone URL
<SETUP_COMMANDS> infra/cloud-init.yaml Toolchain install + build (language-specific)
<BENCH_COMMAND> infra/cloud-init.yaml The command to benchmark
<VERIFY_COMMAND> infra/cloud-init.yaml Smoke test after setup

The cloud-init template includes examples for Python, Rust, Go, Node.js, and Java.

VM lifecycle

Each project gets a vm-manage.sh for the benchmark VM:

cd .codeflash/<member>/<org>/<project>
bash infra/vm-manage.sh create    # provision VM with cloud-init
bash infra/vm-manage.sh bench main  # run benchmarks on a branch
bash infra/vm-manage.sh ssh       # SSH into VM
bash infra/vm-manage.sh stop      # deallocate (stops billing)
bash infra/vm-manage.sh destroy   # delete everything

Examples

Use the existing projects as templates:

  • Rich — focused scope, 2 PRs, import + runtime micro-opts
  • pip — large scope, 122 commits across 8 categories