codeflash-internal/experiments/universe-optimize/program.md

25 KiB

Universe Optimize: Automated Optimization Outbound Program

Overview

This program automates the full pipeline: discover high-value open-source projects, fork them, spin up Azure VMs, run Codeflash optimizations via Claude Code, collect results, and draft personalized outreach emails with real optimization proof points. The orchestrator runs locally and manages everything end-to-end.


Architecture

Local Machine (Orchestrator)
  |
  |-- orchestrator.py  (Python script that manages the full pipeline)
  |       |
  |       |-- Azure SDK: provisions/destroys VMs
  |       |-- GitHub API: forks repos to codeflash-ai org
  |       |-- SSH (paramiko): executes commands on VMs
  |       |-- Results DB: SQLite tracking all projects/VMs/results
  |       |-- Email drafter: generates personalized emails from results
  |
  +-- VM 1 (Azure Ubuntu, 4 CPU / 16GB RAM)
  |     |-- Claude Code (--dangerously-skip-permissions, --plugin-dir=codeflash-agent)
  |     |-- Forked repo cloned
  |     |-- optimization.md injected as CLAUDE.md instructions
  |     |-- Results written to .codeflash/results.tsv + summary.json
  |
  +-- VM 2 ...
  +-- VM N ...

Phase 0: Project Discovery (Manual for POC, Automated Later)

POC: Hand-pick 2 projects

For the proof of concept, manually select 2 projects that are:

  • Popular (>1k stars), actively maintained
  • Python, JavaScript/TypeScript, or Java
  • Have a test suite (pytest, jest, JUnit) so optimizations can be verified
  • Backed by a company or used by companies where we can identify an engineering leader to email
  • Performance-sensitive domain (data processing, web frameworks, databases, ML infra, etc.)

Store project metadata in projects.json:

[
  {
    "id": "project-001",
    "repo": "org/repo-name",
    "language": "python",
    "stars": 5200,
    "description": "...",
    "company": "Company Name",
    "domain": "data processing",
    "target_contact": {
      "name": "First Last",
      "title": "CTO",
      "email": "...",
      "linkedin": "..."
    },
    "why_selected": "High star count, performance-critical data pipeline, active maintenance, company has 50-200 employees"
  }
]

At Scale (Post-POC)

Use GitHub API to search for repos matching criteria:

  • stars:>1000 language:python (repeat for JS/TS, Java)
  • Filter by: has CI, recent commits, has test suite, identifiable company/maintainer
  • Cross-reference with Sumble/Apollo APIs (as in existing outbound program) to find the right engineering contact
  • Target 1000 projects total across Python, JS/TS, Java

Phase 1: Infrastructure Provisioning

What the Orchestrator Does

The orchestrator (orchestrator.py) is a Python script that manages the full lifecycle. It uses:

  • Azure SDK (azure-mgmt-compute, azure-identity) to provision and destroy VMs
  • Paramiko for SSH command execution on VMs
  • GitHub API (PyGithub) for forking repos
  • SQLite for tracking state across runs

VM Provisioning

For each project in projects.json:

  1. Fork the repo to the codeflash-ai GitHub org:

    github.get_repo("org/repo").create_fork(organization="codeflash-ai")
    
  2. Provision an Azure VM:

    • Image: Ubuntu 24.04 LTS
    • Size: Standard_D4s_v5 (4 vCPU, 16 GB RAM)
    • Region: East US (or nearest to minimize latency)
    • Disk: 128 GB SSD
    • NSG: SSH only (port 22), locked to orchestrator's IP
    • Tags: {"project": "universe-optimize", "repo": "org/repo-name", "id": "project-001"}
  3. Bootstrap the VM via SSH (single setup script):

    #!/bin/bash
    set -euo pipefail
    
    # System deps
    sudo apt-get update && sudo apt-get install -y git curl build-essential python3-dev
    
    # Install language runtimes as needed
    # Python: install uv
    curl -LsSf https://astral.sh/uv/install.sh | sh
    source $HOME/.local/bin/env
    
    # Node (for JS/TS projects)
    curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
    sudo apt-get install -y nodejs
    
    # Java (for Java projects)
    sudo apt-get install -y openjdk-21-jdk maven gradle
    
    # Install Claude Code CLI
    curl -fsSL https://claude.ai/install.sh | bash
    
    # Set up these API keys in .bashrc
    CODEFLASH_API_KEY=cf-kNZFz7nM3Tl3wY4t5Kh1E58A-UhxKvLhokyTfFIz5YJ_xJJLcBBFrdp8kJF6G8ld
    CLAUDE_CODE_USE_BEDROCK=1
    AWS_REGION=us-east-1
    AWS_BEARER_TOKEN_BEDROCK=ABSKc2FydGhha0Bjb2RlZmxhc2guYWkrMS1hdC05OTIzODI0NjM5MDc6bS9tdGx4SW0wQ08yazMwU3QxZFdlbWRWeTM0NnJWZElBZmFNNmVobC9UU2tRVTBPQm4wUXVPS3ZFQWs9
    LC_ALL=en_US.UTF-8
    
    source ~/.bashrc
    
    # Copy codeflash-agent plugin
    mkdir -p ~/codeflash-agent
    # (orchestrator SCPs the dist/ directory here)
    
    # Clone the forked repo
    git clone https://github.com/codeflash-ai/<repo-name>.git ~/project
    cd ~/project
    
    # Inject optimization.md as the project's CLAUDE.md
    cp ~/optimization.md ~/project/CLAUDE.md
    

State Tracking

The orchestrator maintains a SQLite DB (universe_optimize.db):

CREATE TABLE projects (
    id TEXT PRIMARY KEY,
    repo TEXT NOT NULL,
    language TEXT NOT NULL,
    fork_url TEXT,
    company TEXT,
    contact_name TEXT,
    contact_email TEXT,
    contact_title TEXT,
    status TEXT DEFAULT 'pending',  -- pending, provisioning, running, completed, failed, destroyed
    vm_id TEXT,
    vm_ip TEXT,
    created_at TIMESTAMP,
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    optimization_branch TEXT,
    num_optimizations INTEGER DEFAULT 0,
    best_speedup TEXT,
    summary_json TEXT,
    email_draft_path TEXT
);

Phase 2: Running the Optimization

optimization.md (Injected as CLAUDE.md on Each VM)

This file tells Claude Code exactly what to do when it starts. It is placed as CLAUDE.md in the project root before Claude is launched.

# Codeflash Optimization Run

You are running an automated optimization session on this open-source project.
Your goal is to find as many provably-faster code implementations as possible
and stack them as commits on a single branch.

## Your Mission

1. Understand the project: read the README, project structure, and key source files.
2. Set up the project: install dependencies, verify tests pass.
3. Run `/codeflash-optimize start` and when asked for context, respond with "go".
4. Let the optimization agent work. It will:
   - Profile the codebase (CPU, memory, GC)
   - Identify bottleneck functions
   - Implement optimizations one at a time
   - Verify each with tests
   - Commit each successful optimization
5. When the optimization agent completes (or plateaus), collect results.

## After Optimization Completes

Write a file `~/results/summary.json` with the following structure:

```json
{
  "repo": "<org/repo>",
  "language": "<python|javascript|java>",
  "branch": "<optimization branch name>",
  "total_experiments": <N>,
  "total_keeps": <N>,
  "total_discards": <N>,
  "optimizations": [
    {
      "commit": "<sha>",
      "function": "<function_name>",
      "file": "<file_path>",
      "description": "<what was optimized>",
      "cpu_speedup": "<e.g. 2.3x faster>",
      "memory_reduction": "<e.g. -50 MiB>",
      "technique": "<e.g. replaced list with set, eliminated deepcopy>"
    }
  ],
  "headline_stats": {
    "best_single_speedup": "<e.g. 5x faster>",
    "best_function": "<function_name>",
    "total_cpu_improvement_pct": <number>,
    "total_memory_saved_mb": <number>
  },
  "pr_ready_commits": <N>,
  "status": "completed|plateaued|failed",
  "error": "<if failed, why>"
}

Also copy .codeflash/results.tsv and .codeflash/HANDOFF.md to ~/results/.

Important

  • Work fully autonomously. Do not ask questions -- make reasonable decisions.
  • If tests fail during setup, note the pre-existing failures and work around them.
  • If the project cannot be set up (missing deps, private packages), write summary.json with status "failed" and an error message, then stop.
  • After optimization completes, push the optimization branch to the remote fork: git push origin codeflash/optimize.
  • Time limit: aim to complete within 8 hours. If still running after 8 hours, wrap up, write summary.json with whatever results you have, and stop.

### Launching Claude Code

The orchestrator SSHs into each VM and runs:

```bash
cd ~/project && claude \
  --dangerously-skip-permissions \
  --plugin-dir ~/codeflash-agent/dist \
  --model opus \
  --max-turns 400 \
  --print \
  "Read the CLAUDE.md file and follow its instructions exactly." \
  2>&1 | tee ~/results/claude_output.log

Key flags:

  • --dangerously-skip-permissions: no human approval needed
  • --plugin-dir: loads the codeflash-agent plugin with /codeflash-optimize skill
  • --print: non-interactive mode, outputs to stdout
  • --max-turns 400: generous turn limit for thorough optimization
  • Output captured to log file for debugging

Monitoring

The orchestrator polls each VM periodically (every 10 minutes):

def check_vm_status(vm_ip):
    """Check if Claude is still running and if results are ready."""
    # Check if claude process is still running
    is_running = ssh_exec(vm_ip, "pgrep -f 'claude' > /dev/null && echo running || echo done")

    # Check if summary.json exists (optimization complete)
    has_results = ssh_exec(vm_ip, "test -f ~/results/summary.json && echo yes || echo no")

    # Check elapsed time
    elapsed = ssh_exec(vm_ip, "stat -c %Y ~/project/CLAUDE.md | xargs -I{} expr $(date +%s) - {}")

    return {
        "is_running": is_running.strip() == "running",
        "has_results": has_results.strip() == "yes",
        "elapsed_seconds": int(elapsed.strip())
    }

If a VM has been running >5 hours with no results, the orchestrator:

  1. SSHs in and sends SIGTERM to claude
  2. Waits 60s for graceful shutdown
  3. Checks if partial results exist in .codeflash/results.tsv
  4. Marks project as failed or completed (with partial results)

Phase 3: Results Collection

Once Claude completes on a VM:

  1. SCP the results from the VM to local:

    scp_download(vm_ip, "~/results/summary.json", f"results/{project_id}/summary.json")
    scp_download(vm_ip, "~/results/results.tsv", f"results/{project_id}/results.tsv")
    scp_download(vm_ip, "~/results/HANDOFF.md", f"results/{project_id}/HANDOFF.md")
    scp_download(vm_ip, "~/results/claude_output.log", f"results/{project_id}/claude_output.log")
    
  2. Verify the branch was pushed. The VM Claude pushes the branch as part of its workflow. Verify:

    ssh vm "cd ~/project && git log --oneline origin/codeflash/optimize -5"
    

    If not pushed (e.g. Claude failed before that step), push from the VM before destroying it:

    ssh vm "cd ~/project && git push origin codeflash/optimize"
    
  3. Update the DB with results from summary.json.

  4. Destroy the VM to stop burning money:

    azure_client.virtual_machines.begin_delete(resource_group, vm_name)
    

Results Directory Structure

results/
  project-001/
    summary.json          # Full optimization results (from VM)
    results.tsv           # Per-experiment log (from VM)
    HANDOFF.md            # Session state (from VM)
    claude_output.log     # Full Claude transcript (for debugging)
    context.json          # Template variables (built by orchestrator from summary + project data)
    emails/
      email_1_proof.md    # Rendered email (regenerated on template edit)
      email_2_followup.md
      email_3_risk.md
  project-002/
    ...

Phase 4: Email Drafting

The email system separates data from templates. All optimization results and contact info are stored locally as structured data (context.json). Templates are standalone files with {placeholder} variables. You can edit a template and re-render all emails with one command.

Data Layer: context.json

After results are collected, the orchestrator builds context.json for each project by merging summary.json (optimization results from the VM) with projects.json (contact/company info). This is the single source of truth for all email variables.

{
  "first_name": "Gil",
  "full_name": "Gil Tene",
  "title": "CTO",
  "company_name": "Azul",
  "repo": "azul/zulu-openjdk",
  "repo_name": "zulu-openjdk",
  "fork_url": "https://github.com/codeflash-ai/zulu-openjdk",
  "branch_url": "https://github.com/codeflash-ai/zulu-openjdk/tree/codeflash/optimize",
  "num_optimizations": 12,
  "best_function": "parseClassFile",
  "best_speedup": "5x faster",
  "best_description": "replaced linear scan with hash lookup in class file parser",
  "second_best_function": "resolveMethod",
  "second_best_technique": "eliminated redundant deepcopy in method resolution",
  "total_cpu_improvement_pct": 34,
  "total_memory_saved_mb": 120,
  "optimizations_summary": "12 merge-ready commits including 5x faster class file parsing, 3x faster method resolution, and 40% memory reduction in bytecode verification",
  "calendly_link": "https://calendly.com/codeflash-saurabh/30min"
}

Template Layer: Editable Email Templates

Templates live in email_templates/ and use {variable_name} placeholders. Edit these any time, then re-render.

email_templates/email_1_proof.md

Subject: I created {num_optimizations} PRs that speed up {repo_name}

Hi {first_name},

I'm Saurabh, ex-CMU and Meta, and CEO and Founder of Codeflash.

We work with companies like Unstructured.io and HuggingFace who all face a
growing challenge: as AI coding tools generate more of the codebase, performance
regressions slip in faster than teams can catch them.

Codeflash is the performance layer that sits on top of your AI coding workflow.
It finds provably faster implementations for your existing code and ensures
every new PR ships optimized.

I ran Codeflash on a fork of {repo} and created {num_optimizations} merge-ready
commits that significantly speed up several crucial components -- {best_function}
now runs {best_speedup}! {optimizations_summary}

Unstructured.io used Codeflash across their entire infrastructure and cut
compute costs by 50%.

I'd love to walk you through the results and show you how much more free
performance is hiding across your full codebase. Would sometime this week work?

{calendly_link}

Thanks,
Saurabh
Founder, Codeflash.ai

email_templates/email_2_followup.md

Subject: Re: I created {num_optimizations} PRs that speed up {repo_name}

Hi {first_name},

Wanted to follow up -- did your team get a chance to look at the optimizations?

You can see all {num_optimizations} commits here:
{branch_url}

If those didn't hit the right area of your codebase, I have an open offer:
share any performance benchmark your team cares about, and I'll run Codeflash
against it and send you the results. No commitment, just proof.

{calendly_link}

Saurabh
Founder, codeflash.ai

email_templates/email_3_risk.md

Subject: Re: I created {num_optimizations} PRs that speed up {repo_name}

Hi {first_name},

With AI coding tools writing more of the code, performance regressions are
showing up faster and quieter than before. By the time they surface, it's
production issues and fire drills.

Unstructured.io plugged Codeflash into their workflow and cut compute costs
by 50% -- and now every PR is automatically checked before it merges.

Happy to show you what that looks like for {company_name}.

{calendly_link}

Saurabh
Founder, codeflash.ai

Rendering: Template + Data = Emails

def build_context(project_id):
    """Build context.json from summary.json + projects.json. Idempotent."""
    project = load_project(project_id)
    summary = json.load(open(f"results/{project_id}/summary.json"))

    if summary["status"] == "failed" or summary["total_keeps"] == 0:
        return None

    best = summary["headline_stats"]
    opts = summary["optimizations"]

    context = {
        "first_name": project["target_contact"]["name"].split()[0],
        "full_name": project["target_contact"]["name"],
        "title": project["target_contact"]["title"],
        "company_name": project["company"],
        "repo": project["repo"],
        "repo_name": project["repo"].split("/")[1],
        "fork_url": f"https://github.com/codeflash-ai/{project['repo'].split('/')[1]}",
        "branch_url": f"https://github.com/codeflash-ai/{project['repo'].split('/')[1]}/tree/codeflash/optimize",
        "num_optimizations": summary["total_keeps"],
        "best_function": best["best_function"],
        "best_speedup": best["best_single_speedup"],
        "best_description": opts[0]["description"] if opts else "",
        "second_best_function": opts[1]["function"] if len(opts) > 1 else "",
        "second_best_technique": opts[1]["technique"] if len(opts) > 1 else "",
        "total_cpu_improvement_pct": best.get("total_cpu_improvement_pct", 0),
        "total_memory_saved_mb": best.get("total_memory_saved_mb", 0),
        "optimizations_summary": build_summary_sentence(opts),
        "calendly_link": "https://calendly.com/codeflash-saurabh/30min",
    }

    write_json(f"results/{project_id}/context.json", context)
    return context


def render_emails(project_id):
    """Render all email templates for a project. Re-run after template edits."""
    context = json.load(open(f"results/{project_id}/context.json"))
    os.makedirs(f"results/{project_id}/emails", exist_ok=True)

    for template_file in sorted(glob("email_templates/email_*.md")):
        template = open(template_file).read()
        rendered = template.format(**context)
        out_name = os.path.basename(template_file)
        write_file(f"results/{project_id}/emails/{out_name}", rendered)


def render_all_emails():
    """Re-render emails for ALL completed projects. Use after editing a template."""
    for project_id in get_completed_project_ids():
        render_emails(project_id)

Orchestrator Email Commands

# Build context + render emails for one project (after results collected)
python orchestrator.py email <project-id>

# Re-render ALL project emails after you edit a template
python orchestrator.py email --rerender-all

# Preview rendered emails in terminal
python orchestrator.py email <project-id> --show

# Show raw context data (for debugging or manual override)
python orchestrator.py email <project-id> --show-context

Email Editing Workflow

  1. Edit the template (affects all future projects):

    • Edit email_templates/email_1_proof.md (or email_2, email_3)
    • Run python orchestrator.py email --rerender-all
    • All project emails are regenerated
  2. Override data for one project (e.g. fix a name):

    • Edit results/<id>/context.json directly
    • Run python orchestrator.py email <project-id>
    • Only that project's emails are regenerated
  3. One-off edit for a single email (e.g. add a personal note):

    • Edit results/<id>/emails/email_1_proof.md directly
    • This is the final rendered copy -- it won't be overwritten unless you explicitly re-render

Phase 5: Review, Present Results & Send

This phase is intentionally manual. The orchestrator gives you all the data and rendered emails; you decide what ships.

Results Presentation

# Dashboard: all projects at a glance
python orchestrator.py status

ID           Repo                    Status      Opts  Best         Email
project-001  pallets/flask           completed   12    5x faster    DRAFT READY
project-002  encode/httpx            completed   8     3x faster    DRAFT READY
project-003  tiangolo/fastapi        failed      0     --           --

# Deep dive into one project
python orchestrator.py results <project-id>

Project: pallets/flask
Status: completed
Branch: https://github.com/codeflash-ai/flask/tree/codeflash/optimize
Experiments: 18 total (12 kept, 6 discarded)

Top optimizations:
  1. parse_rule()        5.0x faster   replaced regex with string split
  2. match_request()     3.2x faster   eliminated redundant dict copy
  3. send_file()         2.1x faster   switched to sendfile() syscall
  4. url_for()           1.8x faster   cached reverse route lookup
  ...

Headline: 34% total CPU improvement, 120 MiB memory saved

Contact: Armin Ronacher (Creator) -- armin@palletsprojects.com
Emails: results/project-001/emails/

Review Checklist

For each project:

  1. Review the optimizations: Open the fork on GitHub (link from orchestrator.py results <id>). Scan the commits. Are they legit?

  2. Review the emails: orchestrator.py email <id> --show. Check numbers, names, tone.

  3. Edit if needed (see Email Editing Workflow above).

  4. Send: Copy from results/<id>/emails/ into your email client or Apollo. Mark as sent:

    python orchestrator.py mark-sent <project-id>
    

POC Plan: 2 Projects

Step 1: Select Projects

Pick 2 projects. Suggested criteria for POC:

  • One Python project, one Java or JS/TS project (to prove multi-language)
  • Both should have >2k stars and active test suites
  • Companies behind them should be in the 20-500 employee range
  • The engineering contact should be identifiable via LinkedIn/Apollo

Step 2: Build the Orchestrator (MVP)

For the POC, the orchestrator can be simplified:

experiments/universe-optimize/
  orchestrator.py          # Main script
  optimization.md          # Template injected as CLAUDE.md on VMs
  projects.json            # The 2 POC projects
  bootstrap.sh             # VM setup script
  email_templates/
    email_1_proof.md
    email_2_followup.md
    email_3_risk.md
  results/                 # Collected results (gitignored)

MVP orchestrator commands:

  • python orchestrator.py provision <project-id> -- Create VM, fork repo, bootstrap
  • python orchestrator.py run <project-id> -- SSH in and launch Claude Code
  • python orchestrator.py status [project-id] -- Check status of all or one project
  • python orchestrator.py collect <project-id> -- SCP results, push branch, destroy VM
  • python orchestrator.py email <project-id> -- Generate email draft from results
  • python orchestrator.py status-all -- Dashboard view

Step 3: Run & Iterate

  1. Provision both VMs in parallel
  2. Launch Claude Code on both
  3. Monitor (check every 10 min)
  4. Collect results when done
  5. Review optimizations on GitHub
  6. Review and edit email drafts
  7. Send emails manually
  8. Destroy VMs

Step 4: Evaluate POC

After the 2 projects complete, assess:

  • Quality: Were the optimizations real and meaningful? Would they impress an engineering leader?
  • Reliability: Did Claude complete successfully, or did it get stuck/fail?
  • Cost: What was the Azure + Anthropic API cost per project?
  • Time: How long did each optimization run take?
  • Email quality: Did the auto-generated emails need heavy editing?

Use these findings to decide whether to scale to 10, then 100, then 1000 projects.


Scaling Considerations (Post-POC)

Parallelism

  • POC: 2 VMs at a time
  • Scale: batch in groups of 10-20 VMs to manage cost and API rate limits
  • Use Azure VMSS (Virtual Machine Scale Sets) for easier provisioning at scale

Cost Control

  • Auto-destroy VMs after 24 hours regardless of status
  • Estimated per-project cost: ~$2-5 Azure compute + $5-20 Anthropic API = $7-25/project
  • At 1000 projects: $7k-25k total

Project Discovery Automation

  • GitHub API search + filtering pipeline
  • Sumble API for company enrichment (as in existing outbound program)
  • Apollo API for contact discovery (as in existing outbound program)
  • Auto-populate projects.json with enriched data

Results Quality Gate

  • Auto-skip email generation if <3 optimizations found
  • Auto-skip if no optimization exceeds 1.5x speedup (not impressive enough)
  • Flag projects where Claude failed for manual investigation

Email Pipeline Integration

  • Instead of manual send, integrate with Apollo sequences
  • Auto-create drafts in Apollo, linked to the contact list
  • Still require manual approval before sending

File Inventory

File Purpose
program.md This document -- the full program specification
plan.md Original high-level plan (kept for reference)
orchestrator.py Main orchestrator script
optimization.md Template injected as CLAUDE.md on each VM
bootstrap.sh VM setup script (copied to VM and executed)
projects.json Project list with metadata and contacts
universe_optimize.db SQLite state database
email_templates/*.md Email templates with placeholders
results/<id>/ Per-project results directory

Open Questions

  1. API key management: How to securely inject the Anthropic API key onto VMs? Options: Azure Key Vault, environment variable via SSH, or baked into VM image. -> Env vars via SSH
  2. GitHub auth on VMs: Need a GitHub PAT or deploy key on each VM for cloning and pushing. Use a dedicated bot account? -> PAT
  3. Claude Code model: Use Opus for maximum quality, or Sonnet for cost savings? POC should use Opus; scale run could use Sonnet with Opus fallback for stuck sessions.
  4. Java/JS optimization: The codeflash-agent plugin is Python-focused today. For Java/JS projects, Claude will need to optimize without the plugin -- just using its own profiling/optimization knowledge. This may produce lower quality results. Should we restrict POC to Python only? Lets restrict POC to python
  5. Rate limits: Anthropic API rate limits at scale. May need multiple API keys or request a rate limit increase.