mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

Claude Code plugin for autonomous Python runtime performance optimization

Find a file

Kevin Turcios 1ff2a76152 perf(analytics): use rfind and local json.loads (#44 ) * perf(analytics): use rfind and local json.loads for hot paths Replace Path().suffix with string rfind for extension extraction, use local json.loads binding and bytes split for JSONL parsing. * fix: use splitlines and preserve extensionless file behavior split("\n") mishandles \r\n line endings. The early return on extensionless files changed behavior vs the original Path().suffix which returned "" and fell through. Use splitlines() and let extensionless files fall through with lang=None. * style: use ternary for extensionless file check per SIM108 * Add blackbox benchmark VM infra D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning, CPU-pinned benchmarks, and A/B comparison scripts. --------- Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>		2026-04-29 03:22:42 -05:00
.claude	Add rules from session audit: error handling, testing, debugging	2026-04-21 21:06:15 -05:00
.codeflash	perf(analytics): use rfind and local json.loads (#44 )	2026-04-29 03:22:42 -05:00
.codex	chore: initialize tessl with vendored tiles	2026-04-23 07:43:04 -05:00
.gemini	chore: initialize tessl with vendored tiles	2026-04-23 07:43:04 -05:00
.github	Add blackbox package: session flight recorder with HTMX dashboard (#39 )	2026-04-28 19:58:43 -05:00
.tessl	chore: update tessl tiles 2026-04-23 (#35 )	2026-04-23 08:15:44 -05:00
.vscode	chore: initialize tessl with vendored tiles	2026-04-23 07:43:04 -05:00
case-studies	Refine engagement report and case study for executive review	2026-04-16 17:51:54 -05:00
docs	squash	2026-04-13 14:12:17 -05:00
evals	Add [tool.codeflash] config to layered eval template	2026-04-21 06:41:31 -05:00
packages	perf(analytics): use rfind and local json.loads (#44 )	2026-04-29 03:22:42 -05:00
plugin	Add set -e to vendored session-start-env.sh	2026-04-21 05:32:30 -05:00
reports	chore: add standup dashboard with CI audit integration (#36 )	2026-04-23 18:52:33 -05:00
scripts	Lint and format entire repo, not just packages (#23 )	2026-04-15 03:16:15 -05:00
.gitignore	Fix pre-existing CI lint and test failures (#40 )	2026-04-28 18:39:46 -05:00
.mcp.json	chore: initialize tessl with vendored tiles	2026-04-23 07:43:04 -05:00
.pre-commit-config.yaml	Lint and format entire repo, not just packages (#23 )	2026-04-15 03:16:15 -05:00
.python-version	Fix Dependabot resolver and bump GitPython for security (#42 )	2026-04-28 20:28:42 -05:00
CLAUDE.md	Add codeflash-ai/ci-audit to active case studies list	2026-04-23 03:52:10 -05:00
DEVELOPMENT.md	squash	2026-04-13 14:12:17 -05:00
LICENSE	Hello World	2026-03-24 16:14:04 -05:00
Makefile	Improve deep optimizer: profiling script + failure modes + dist fix (#24 )	2026-04-15 04:11:52 -05:00
pyproject.toml	Add blackbox package: session flight recorder with HTMX dashboard (#39 )	2026-04-28 19:58:43 -05:00
README.md	Add metaflow and core-product case studies, rename pypa to python (#18 )	2026-04-14 23:31:49 -05:00
ROADMAP.md	Add CI optimization item to roadmap	2026-04-23 05:59:08 -05:00
tessl.json	chore: update tessl tiles 2026-04-23 (#35 )	2026-04-23 08:15:44 -05:00
uv.lock	Fix Dependabot resolver and bump GitPython for security (#42 )	2026-04-28 20:28:42 -05:00

README.md

codeflash-agent

Autonomous performance optimization platform. Profiles code, implements optimizations, benchmarks before and after, and iterates until plateau.

What it's achieved on real projects:

Project	Result	Details
Rich	2x Console import (79ms → 34ms)	summary
pip	7x `--version` (138ms → 20ms), 1.81x resolver	summary
typeagent-py	2.6x query path, 1.16x import + indexing	summary
core-product	14.6% latency, 2.1 GB memory savings	summary
metaflow	7-18x artifact compression (lz4 vs gzip)	summary

Domains

Domain	When to use
CPU	CPU time, O(n²) loops, wrong containers, algorithmic complexity
Memory	Peak memory, OOM, memory leaks, RSS reduction
Async	Concurrency, event loop blocking, sequential awaits, throughput/latency
Structure	Import time, circular deps, module reorganization for performance
Deep	Cross-domain optimization — profiles all domains and iterates until plateau

The agent auto-detects which domain(s) apply based on your request.

Install

Build the plugin first, then launch Claude with it:

git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
make build-plugin  # assembles plugin into dist/ — must run before launching
claude --dangerously-skip-permissions --effort max --plugin-dir ./dist/

Your first optimization

Just run:

> /codeflash-optimize start

If you know where the problem lies, describe it in natural language instead:

> Our /process endpoint takes 5s but individual calls should only take 500ms each
> process_records is too slow, it's doing O(n²) lookups

Other commands:

> /codeflash-optimize scan     # quick cross-domain diagnosis (no changes)
> /codeflash-optimize status   # check progress
> /codeflash-optimize resume   # continue from where you left off
> /codeflash-optimize review   # review current changes or a PR

Codeflash will profile, analyze, implement fixes one at a time, re-profile after each, and stop when gains plateau. Session state persists in HANDOFF.md and results.tsv so you can resume across conversations.

For contributors

Dev setup

git clone https://github.com/codeflash-ai/codeflash-agent.git
cd codeflash-agent
uv sync                          # install all packages + dev deps
prek run --all-files             # lint: ruff check, ruff format, interrogate, mypy
uv run pytest packages/ -v      # test all packages

Plugin development

make build-plugin    # assemble plugin → dist/ (base + python overlay + vendor)
make clean           # remove dist/

The plugin is self-contained under plugin/:

plugin/ — language-agnostic agents, hooks, shared references
plugin/languages/python/ — Python domain agents, skills, references
plugin/languages/javascript/ — JavaScript domain agents, skills, references
make build-plugin assembles base + language overlay into dist/ (default: LANG=python)

Optimization patterns

Distilled from 122 pip commits + 2 Rich PRs. Ordered by typical impact.

Tier	Category	Examples	Typical impact
1	Startup / Import	Fast-path early exit, import deferral, `TYPE_CHECKING` guards, dead import removal	2-100x for startup paths
2	Architecture	`@dataclass` → `__slots__`, lazy loading, speculative prefetch, conditional rebuild, caching	10-60% on hot paths
3	Micro	Identity shortcuts (`is` before `==`), bypass public API internally, hoist to module level, `__slots__` on hot classes	1.1-1.8x per call
4	I/O	Replace slow serializers, connection pooling, parallel I/O	2-5x for I/O-bound ops

Anti-patterns to avoid: caching with low hit rate, premature __slots__, over-deferring imports in one-time paths, optimizing cold paths.

Full pattern catalog with examples: docs/codeflash-agent-dogfooding.md

Methodology

Profiling toolkit

Tool	Purpose	When to use
`python -X importtime`	Import cost breakdown	First step for any CLI tool
`hyperfine`	E2E command timing with statistics	Before/after validation
`cProfile` / `py-spy`	Function-level CPU profiling	Finding hot functions
`timeit`	Micro-benchmarks for specific functions	Validating micro-opts
`memray` / `tracemalloc`	Memory profiling	Allocation-heavy paths
`objgraph`	Object count tracking	Finding redundant allocations

Workflow

1. Profile → identify top-N bottlenecks
2. For each bottleneck:
   a. Read the actual code (don't guess from profiler shapes)
   b. Implement the smallest change that addresses it
   c. Micro-benchmark before/after
   d. Run full test suite
   e. E2E benchmark
3. Commit with clear perf: prefix and numbers
4. Repeat until plateau

Environment requirements

Non-burstable VM (e.g., Azure Standard_D2s_v5) for consistent CPU
Multiple Python versions (3.12, 3.13 minimum — behavior differs)
hyperfine --warmup 5 --min-runs 30 for statistical rigor
All tests passing before AND after every change

Full methodology details: docs/codeflash-agent-dogfooding.md

Workspace convention

Each target organization gets its own <org>_org/ directory containing all repos for that org:

~/Desktop/work/
├── cf_org/                    # Codeflash
│   ├── codeflash-agent/       # this monorepo
│   ├── codeflash/             # core engine
│   ├── codeflash-internal/    # backend service
│   └── ...
├── unstructured_org/          # Unstructured.io
│   ├── unstructured/          # open source library
│   ├── core-product/          # main product
│   ├── unstructured-inference/ # ML inference
│   └── ...
├── microsoft_org/             # Microsoft
│   └── typeagent/             # typeagent-py (Structured RAG)
├── roboflow_org/              # Roboflow
│   └── supervision/
└── <org>_org/                 # new target org
    └── <repo>/

When starting work on a new org: create <org>_org/, clone all relevant repos under it, and keep non-repo files out of the org directory.

Repo structure

packages/
  codeflash-core/              # shared foundation (models, AI client, telemetry, git)
  codeflash-python/            # Python language CLI — extends core
  codeflash-mcp/               # MCP server (stub)
  codeflash-lsp/               # LSP server (stub)

services/
  github-app/                  # GitHub App integration (FastAPI)

plugin/                        # Claude Code plugin (self-contained, multi-language)
  languages/python/            # Python domain agents, skills, references
  languages/javascript/        # JavaScript domain agents, skills, references

.codeflash/                  # active optimization data (teammember/org/project)
  krrt7/textualize/rich/       # 2x Rich import speedup
  krrt7/python/pip/            # 7x pip --version, 1.81x resolver
  krrt7/microsoft/typeagent/   # Structured RAG optimization
  <member>/<org>/<project>/    # new optimization targets

case-studies/                  # summaries built from .codeflash/
scripts/                       # scaffold scripts
docs/                          # internal guides
evals/                         # eval templates & real-repo scenarios

Adding an optimization target

When you optimize a new project, scaffold it in .codeflash/ and build summaries into case-studies/.

1. Set up local workspace

Each org gets a <org>_org/ directory under work/. Clone from your fork, add the upstream remote:

mkdir -p ~/Desktop/work/<org>_org
git clone https://github.com/KRRT7/<repo>.git ~/Desktop/work/<org>_org/<project>
cd ~/Desktop/work/<org>_org/<project>
git remote add upstream https://github.com/<org>/<repo>.git

2. Scaffold the project

# Single project:
make bootstrap ORG=roboflow PROJECTS=supervision

# Multiple projects under one org:
make bootstrap ORG=unstructured PROJECTS="unstructured unstructured-inference core-product"

This creates:

.codeflash/<member>/<org>/<project>/
├── README.md              # results, what changed, methodology (from template)
├── bench/                 # add your benchmark scripts here
├── data/                  # save raw benchmark data here
└── infra/
    ├── cloud-init.yaml    # VM provisioning (fill in remaining placeholders)
    └── vm-manage.sh       # VM lifecycle: create, start, stop, ssh, bench, destroy

3. Fill in the placeholders

The scaffold substitutes <PROJECT> automatically. You still need to fill in:

Placeholder	Where	What to fill in
`<REPO_URL>`	`infra/cloud-init.yaml`	Your fork's clone URL
`<SETUP_COMMANDS>`	`infra/cloud-init.yaml`	Toolchain install + build (language-specific)
`<BENCH_COMMAND>`	`infra/cloud-init.yaml`	The command to benchmark
`<VERIFY_COMMAND>`	`infra/cloud-init.yaml`	Smoke test after setup

The cloud-init template includes examples for Python, Rust, Go, Node.js, and Java.

VM lifecycle

Each project gets a vm-manage.sh for the benchmark VM:

cd .codeflash/<member>/<org>/<project>
bash infra/vm-manage.sh create    # provision VM with cloud-init
bash infra/vm-manage.sh bench main  # run benchmarks on a branch
bash infra/vm-manage.sh ssh       # SSH into VM
bash infra/vm-manage.sh stop      # deallocate (stops billing)
bash infra/vm-manage.sh destroy   # delete everything

Examples

Use the existing projects as templates:

Rich — focused scope, 2 PRs, import + runtime micro-opts
pip — large scope, 122 commits across 8 categories