25 KiB
Universe Optimize: Automated Optimization Outbound Program
Overview
This program automates the full pipeline: discover high-value open-source projects, fork them, spin up Azure VMs, run Codeflash optimizations via Claude Code, collect results, and draft personalized outreach emails with real optimization proof points. The orchestrator runs locally and manages everything end-to-end.
Architecture
Local Machine (Orchestrator)
|
|-- orchestrator.py (Python script that manages the full pipeline)
| |
| |-- Azure SDK: provisions/destroys VMs
| |-- GitHub API: forks repos to codeflash-ai org
| |-- SSH (paramiko): executes commands on VMs
| |-- Results DB: SQLite tracking all projects/VMs/results
| |-- Email drafter: generates personalized emails from results
|
+-- VM 1 (Azure Ubuntu, 4 CPU / 16GB RAM)
| |-- Claude Code (--dangerously-skip-permissions, --plugin-dir=codeflash-agent)
| |-- Forked repo cloned
| |-- optimization.md injected as CLAUDE.md instructions
| |-- Results written to .codeflash/results.tsv + summary.json
|
+-- VM 2 ...
+-- VM N ...
Phase 0: Project Discovery (Manual for POC, Automated Later)
POC: Hand-pick 2 projects
For the proof of concept, manually select 2 projects that are:
- Popular (>1k stars), actively maintained
- Python, JavaScript/TypeScript, or Java
- Have a test suite (pytest, jest, JUnit) so optimizations can be verified
- Backed by a company or used by companies where we can identify an engineering leader to email
- Performance-sensitive domain (data processing, web frameworks, databases, ML infra, etc.)
Store project metadata in projects.json:
[
{
"id": "project-001",
"repo": "org/repo-name",
"language": "python",
"stars": 5200,
"description": "...",
"company": "Company Name",
"domain": "data processing",
"target_contact": {
"name": "First Last",
"title": "CTO",
"email": "...",
"linkedin": "..."
},
"why_selected": "High star count, performance-critical data pipeline, active maintenance, company has 50-200 employees"
}
]
At Scale (Post-POC)
Use GitHub API to search for repos matching criteria:
stars:>1000 language:python(repeat for JS/TS, Java)- Filter by: has CI, recent commits, has test suite, identifiable company/maintainer
- Cross-reference with Sumble/Apollo APIs (as in existing outbound program) to find the right engineering contact
- Target 1000 projects total across Python, JS/TS, Java
Phase 1: Infrastructure Provisioning
What the Orchestrator Does
The orchestrator (orchestrator.py) is a Python script that manages the full lifecycle. It uses:
- Azure SDK (
azure-mgmt-compute,azure-identity) to provision and destroy VMs - Paramiko for SSH command execution on VMs
- GitHub API (
PyGithub) for forking repos - SQLite for tracking state across runs
VM Provisioning
For each project in projects.json:
-
Fork the repo to the
codeflash-aiGitHub org:github.get_repo("org/repo").create_fork(organization="codeflash-ai") -
Provision an Azure VM:
- Image: Ubuntu 24.04 LTS
- Size:
Standard_D4s_v5(4 vCPU, 16 GB RAM) - Region: East US (or nearest to minimize latency)
- Disk: 128 GB SSD
- NSG: SSH only (port 22), locked to orchestrator's IP
- Tags:
{"project": "universe-optimize", "repo": "org/repo-name", "id": "project-001"}
-
Bootstrap the VM via SSH (single setup script):
#!/bin/bash set -euo pipefail # System deps sudo apt-get update && sudo apt-get install -y git curl build-essential python3-dev # Install language runtimes as needed # Python: install uv curl -LsSf https://astral.sh/uv/install.sh | sh source $HOME/.local/bin/env # Node (for JS/TS projects) curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash - sudo apt-get install -y nodejs # Java (for Java projects) sudo apt-get install -y openjdk-21-jdk maven gradle # Install Claude Code CLI curl -fsSL https://claude.ai/install.sh | bash # Set up these API keys in .bashrc CODEFLASH_API_KEY=cf-kNZFz7nM3Tl3wY4t5Kh1E58A-UhxKvLhokyTfFIz5YJ_xJJLcBBFrdp8kJF6G8ld CLAUDE_CODE_USE_BEDROCK=1 AWS_REGION=us-east-1 AWS_BEARER_TOKEN_BEDROCK=ABSKc2FydGhha0Bjb2RlZmxhc2guYWkrMS1hdC05OTIzODI0NjM5MDc6bS9tdGx4SW0wQ08yazMwU3QxZFdlbWRWeTM0NnJWZElBZmFNNmVobC9UU2tRVTBPQm4wUXVPS3ZFQWs9 LC_ALL=en_US.UTF-8 source ~/.bashrc # Copy codeflash-agent plugin mkdir -p ~/codeflash-agent # (orchestrator SCPs the dist/ directory here) # Clone the forked repo git clone https://github.com/codeflash-ai/<repo-name>.git ~/project cd ~/project # Inject optimization.md as the project's CLAUDE.md cp ~/optimization.md ~/project/CLAUDE.md
State Tracking
The orchestrator maintains a SQLite DB (universe_optimize.db):
CREATE TABLE projects (
id TEXT PRIMARY KEY,
repo TEXT NOT NULL,
language TEXT NOT NULL,
fork_url TEXT,
company TEXT,
contact_name TEXT,
contact_email TEXT,
contact_title TEXT,
status TEXT DEFAULT 'pending', -- pending, provisioning, running, completed, failed, destroyed
vm_id TEXT,
vm_ip TEXT,
created_at TIMESTAMP,
started_at TIMESTAMP,
completed_at TIMESTAMP,
optimization_branch TEXT,
num_optimizations INTEGER DEFAULT 0,
best_speedup TEXT,
summary_json TEXT,
email_draft_path TEXT
);
Phase 2: Running the Optimization
optimization.md (Injected as CLAUDE.md on Each VM)
This file tells Claude Code exactly what to do when it starts. It is placed as CLAUDE.md in the project root before Claude is launched.
# Codeflash Optimization Run
You are running an automated optimization session on this open-source project.
Your goal is to find as many provably-faster code implementations as possible
and stack them as commits on a single branch.
## Your Mission
1. Understand the project: read the README, project structure, and key source files.
2. Set up the project: install dependencies, verify tests pass.
3. Run `/codeflash-optimize start` and when asked for context, respond with "go".
4. Let the optimization agent work. It will:
- Profile the codebase (CPU, memory, GC)
- Identify bottleneck functions
- Implement optimizations one at a time
- Verify each with tests
- Commit each successful optimization
5. When the optimization agent completes (or plateaus), collect results.
## After Optimization Completes
Write a file `~/results/summary.json` with the following structure:
```json
{
"repo": "<org/repo>",
"language": "<python|javascript|java>",
"branch": "<optimization branch name>",
"total_experiments": <N>,
"total_keeps": <N>,
"total_discards": <N>,
"optimizations": [
{
"commit": "<sha>",
"function": "<function_name>",
"file": "<file_path>",
"description": "<what was optimized>",
"cpu_speedup": "<e.g. 2.3x faster>",
"memory_reduction": "<e.g. -50 MiB>",
"technique": "<e.g. replaced list with set, eliminated deepcopy>"
}
],
"headline_stats": {
"best_single_speedup": "<e.g. 5x faster>",
"best_function": "<function_name>",
"total_cpu_improvement_pct": <number>,
"total_memory_saved_mb": <number>
},
"pr_ready_commits": <N>,
"status": "completed|plateaued|failed",
"error": "<if failed, why>"
}
Also copy .codeflash/results.tsv and .codeflash/HANDOFF.md to ~/results/.
Important
- Work fully autonomously. Do not ask questions -- make reasonable decisions.
- If tests fail during setup, note the pre-existing failures and work around them.
- If the project cannot be set up (missing deps, private packages), write summary.json with status "failed" and an error message, then stop.
- After optimization completes, push the optimization branch to the remote fork:
git push origin codeflash/optimize. - Time limit: aim to complete within 8 hours. If still running after 8 hours, wrap up, write summary.json with whatever results you have, and stop.
### Launching Claude Code
The orchestrator SSHs into each VM and runs:
```bash
cd ~/project && claude \
--dangerously-skip-permissions \
--plugin-dir ~/codeflash-agent/dist \
--model opus \
--max-turns 400 \
--print \
"Read the CLAUDE.md file and follow its instructions exactly." \
2>&1 | tee ~/results/claude_output.log
Key flags:
--dangerously-skip-permissions: no human approval needed--plugin-dir: loads the codeflash-agent plugin with/codeflash-optimizeskill--print: non-interactive mode, outputs to stdout--max-turns 400: generous turn limit for thorough optimization- Output captured to log file for debugging
Monitoring
The orchestrator polls each VM periodically (every 10 minutes):
def check_vm_status(vm_ip):
"""Check if Claude is still running and if results are ready."""
# Check if claude process is still running
is_running = ssh_exec(vm_ip, "pgrep -f 'claude' > /dev/null && echo running || echo done")
# Check if summary.json exists (optimization complete)
has_results = ssh_exec(vm_ip, "test -f ~/results/summary.json && echo yes || echo no")
# Check elapsed time
elapsed = ssh_exec(vm_ip, "stat -c %Y ~/project/CLAUDE.md | xargs -I{} expr $(date +%s) - {}")
return {
"is_running": is_running.strip() == "running",
"has_results": has_results.strip() == "yes",
"elapsed_seconds": int(elapsed.strip())
}
If a VM has been running >5 hours with no results, the orchestrator:
- SSHs in and sends SIGTERM to claude
- Waits 60s for graceful shutdown
- Checks if partial results exist in
.codeflash/results.tsv - Marks project as
failedorcompleted(with partial results)
Phase 3: Results Collection
Once Claude completes on a VM:
-
SCP the results from the VM to local:
scp_download(vm_ip, "~/results/summary.json", f"results/{project_id}/summary.json") scp_download(vm_ip, "~/results/results.tsv", f"results/{project_id}/results.tsv") scp_download(vm_ip, "~/results/HANDOFF.md", f"results/{project_id}/HANDOFF.md") scp_download(vm_ip, "~/results/claude_output.log", f"results/{project_id}/claude_output.log") -
Verify the branch was pushed. The VM Claude pushes the branch as part of its workflow. Verify:
ssh vm "cd ~/project && git log --oneline origin/codeflash/optimize -5"If not pushed (e.g. Claude failed before that step), push from the VM before destroying it:
ssh vm "cd ~/project && git push origin codeflash/optimize" -
Update the DB with results from summary.json.
-
Destroy the VM to stop burning money:
azure_client.virtual_machines.begin_delete(resource_group, vm_name)
Results Directory Structure
results/
project-001/
summary.json # Full optimization results (from VM)
results.tsv # Per-experiment log (from VM)
HANDOFF.md # Session state (from VM)
claude_output.log # Full Claude transcript (for debugging)
context.json # Template variables (built by orchestrator from summary + project data)
emails/
email_1_proof.md # Rendered email (regenerated on template edit)
email_2_followup.md
email_3_risk.md
project-002/
...
Phase 4: Email Drafting
The email system separates data from templates. All optimization results and contact info are stored locally as structured data (context.json). Templates are standalone files with {placeholder} variables. You can edit a template and re-render all emails with one command.
Data Layer: context.json
After results are collected, the orchestrator builds context.json for each project by merging summary.json (optimization results from the VM) with projects.json (contact/company info). This is the single source of truth for all email variables.
{
"first_name": "Gil",
"full_name": "Gil Tene",
"title": "CTO",
"company_name": "Azul",
"repo": "azul/zulu-openjdk",
"repo_name": "zulu-openjdk",
"fork_url": "https://github.com/codeflash-ai/zulu-openjdk",
"branch_url": "https://github.com/codeflash-ai/zulu-openjdk/tree/codeflash/optimize",
"num_optimizations": 12,
"best_function": "parseClassFile",
"best_speedup": "5x faster",
"best_description": "replaced linear scan with hash lookup in class file parser",
"second_best_function": "resolveMethod",
"second_best_technique": "eliminated redundant deepcopy in method resolution",
"total_cpu_improvement_pct": 34,
"total_memory_saved_mb": 120,
"optimizations_summary": "12 merge-ready commits including 5x faster class file parsing, 3x faster method resolution, and 40% memory reduction in bytecode verification",
"calendly_link": "https://calendly.com/codeflash-saurabh/30min"
}
Template Layer: Editable Email Templates
Templates live in email_templates/ and use {variable_name} placeholders. Edit these any time, then re-render.
email_templates/email_1_proof.md
Subject: I created {num_optimizations} PRs that speed up {repo_name}
Hi {first_name},
I'm Saurabh, ex-CMU and Meta, and CEO and Founder of Codeflash.
We work with companies like Unstructured.io and HuggingFace who all face a
growing challenge: as AI coding tools generate more of the codebase, performance
regressions slip in faster than teams can catch them.
Codeflash is the performance layer that sits on top of your AI coding workflow.
It finds provably faster implementations for your existing code and ensures
every new PR ships optimized.
I ran Codeflash on a fork of {repo} and created {num_optimizations} merge-ready
commits that significantly speed up several crucial components -- {best_function}
now runs {best_speedup}! {optimizations_summary}
Unstructured.io used Codeflash across their entire infrastructure and cut
compute costs by 50%.
I'd love to walk you through the results and show you how much more free
performance is hiding across your full codebase. Would sometime this week work?
{calendly_link}
Thanks,
Saurabh
Founder, Codeflash.ai
email_templates/email_2_followup.md
Subject: Re: I created {num_optimizations} PRs that speed up {repo_name}
Hi {first_name},
Wanted to follow up -- did your team get a chance to look at the optimizations?
You can see all {num_optimizations} commits here:
{branch_url}
If those didn't hit the right area of your codebase, I have an open offer:
share any performance benchmark your team cares about, and I'll run Codeflash
against it and send you the results. No commitment, just proof.
{calendly_link}
Saurabh
Founder, codeflash.ai
email_templates/email_3_risk.md
Subject: Re: I created {num_optimizations} PRs that speed up {repo_name}
Hi {first_name},
With AI coding tools writing more of the code, performance regressions are
showing up faster and quieter than before. By the time they surface, it's
production issues and fire drills.
Unstructured.io plugged Codeflash into their workflow and cut compute costs
by 50% -- and now every PR is automatically checked before it merges.
Happy to show you what that looks like for {company_name}.
{calendly_link}
Saurabh
Founder, codeflash.ai
Rendering: Template + Data = Emails
def build_context(project_id):
"""Build context.json from summary.json + projects.json. Idempotent."""
project = load_project(project_id)
summary = json.load(open(f"results/{project_id}/summary.json"))
if summary["status"] == "failed" or summary["total_keeps"] == 0:
return None
best = summary["headline_stats"]
opts = summary["optimizations"]
context = {
"first_name": project["target_contact"]["name"].split()[0],
"full_name": project["target_contact"]["name"],
"title": project["target_contact"]["title"],
"company_name": project["company"],
"repo": project["repo"],
"repo_name": project["repo"].split("/")[1],
"fork_url": f"https://github.com/codeflash-ai/{project['repo'].split('/')[1]}",
"branch_url": f"https://github.com/codeflash-ai/{project['repo'].split('/')[1]}/tree/codeflash/optimize",
"num_optimizations": summary["total_keeps"],
"best_function": best["best_function"],
"best_speedup": best["best_single_speedup"],
"best_description": opts[0]["description"] if opts else "",
"second_best_function": opts[1]["function"] if len(opts) > 1 else "",
"second_best_technique": opts[1]["technique"] if len(opts) > 1 else "",
"total_cpu_improvement_pct": best.get("total_cpu_improvement_pct", 0),
"total_memory_saved_mb": best.get("total_memory_saved_mb", 0),
"optimizations_summary": build_summary_sentence(opts),
"calendly_link": "https://calendly.com/codeflash-saurabh/30min",
}
write_json(f"results/{project_id}/context.json", context)
return context
def render_emails(project_id):
"""Render all email templates for a project. Re-run after template edits."""
context = json.load(open(f"results/{project_id}/context.json"))
os.makedirs(f"results/{project_id}/emails", exist_ok=True)
for template_file in sorted(glob("email_templates/email_*.md")):
template = open(template_file).read()
rendered = template.format(**context)
out_name = os.path.basename(template_file)
write_file(f"results/{project_id}/emails/{out_name}", rendered)
def render_all_emails():
"""Re-render emails for ALL completed projects. Use after editing a template."""
for project_id in get_completed_project_ids():
render_emails(project_id)
Orchestrator Email Commands
# Build context + render emails for one project (after results collected)
python orchestrator.py email <project-id>
# Re-render ALL project emails after you edit a template
python orchestrator.py email --rerender-all
# Preview rendered emails in terminal
python orchestrator.py email <project-id> --show
# Show raw context data (for debugging or manual override)
python orchestrator.py email <project-id> --show-context
Email Editing Workflow
-
Edit the template (affects all future projects):
- Edit
email_templates/email_1_proof.md(or email_2, email_3) - Run
python orchestrator.py email --rerender-all - All project emails are regenerated
- Edit
-
Override data for one project (e.g. fix a name):
- Edit
results/<id>/context.jsondirectly - Run
python orchestrator.py email <project-id> - Only that project's emails are regenerated
- Edit
-
One-off edit for a single email (e.g. add a personal note):
- Edit
results/<id>/emails/email_1_proof.mddirectly - This is the final rendered copy -- it won't be overwritten unless you explicitly re-render
- Edit
Phase 5: Review, Present Results & Send
This phase is intentionally manual. The orchestrator gives you all the data and rendered emails; you decide what ships.
Results Presentation
# Dashboard: all projects at a glance
python orchestrator.py status
ID Repo Status Opts Best Email
project-001 pallets/flask completed 12 5x faster DRAFT READY
project-002 encode/httpx completed 8 3x faster DRAFT READY
project-003 tiangolo/fastapi failed 0 -- --
# Deep dive into one project
python orchestrator.py results <project-id>
Project: pallets/flask
Status: completed
Branch: https://github.com/codeflash-ai/flask/tree/codeflash/optimize
Experiments: 18 total (12 kept, 6 discarded)
Top optimizations:
1. parse_rule() 5.0x faster replaced regex with string split
2. match_request() 3.2x faster eliminated redundant dict copy
3. send_file() 2.1x faster switched to sendfile() syscall
4. url_for() 1.8x faster cached reverse route lookup
...
Headline: 34% total CPU improvement, 120 MiB memory saved
Contact: Armin Ronacher (Creator) -- armin@palletsprojects.com
Emails: results/project-001/emails/
Review Checklist
For each project:
-
Review the optimizations: Open the fork on GitHub (link from
orchestrator.py results <id>). Scan the commits. Are they legit? -
Review the emails:
orchestrator.py email <id> --show. Check numbers, names, tone. -
Edit if needed (see Email Editing Workflow above).
-
Send: Copy from
results/<id>/emails/into your email client or Apollo. Mark as sent:python orchestrator.py mark-sent <project-id>
POC Plan: 2 Projects
Step 1: Select Projects
Pick 2 projects. Suggested criteria for POC:
- One Python project, one Java or JS/TS project (to prove multi-language)
- Both should have >2k stars and active test suites
- Companies behind them should be in the 20-500 employee range
- The engineering contact should be identifiable via LinkedIn/Apollo
Step 2: Build the Orchestrator (MVP)
For the POC, the orchestrator can be simplified:
experiments/universe-optimize/
orchestrator.py # Main script
optimization.md # Template injected as CLAUDE.md on VMs
projects.json # The 2 POC projects
bootstrap.sh # VM setup script
email_templates/
email_1_proof.md
email_2_followup.md
email_3_risk.md
results/ # Collected results (gitignored)
MVP orchestrator commands:
python orchestrator.py provision <project-id>-- Create VM, fork repo, bootstrappython orchestrator.py run <project-id>-- SSH in and launch Claude Codepython orchestrator.py status [project-id]-- Check status of all or one projectpython orchestrator.py collect <project-id>-- SCP results, push branch, destroy VMpython orchestrator.py email <project-id>-- Generate email draft from resultspython orchestrator.py status-all-- Dashboard view
Step 3: Run & Iterate
- Provision both VMs in parallel
- Launch Claude Code on both
- Monitor (check every 10 min)
- Collect results when done
- Review optimizations on GitHub
- Review and edit email drafts
- Send emails manually
- Destroy VMs
Step 4: Evaluate POC
After the 2 projects complete, assess:
- Quality: Were the optimizations real and meaningful? Would they impress an engineering leader?
- Reliability: Did Claude complete successfully, or did it get stuck/fail?
- Cost: What was the Azure + Anthropic API cost per project?
- Time: How long did each optimization run take?
- Email quality: Did the auto-generated emails need heavy editing?
Use these findings to decide whether to scale to 10, then 100, then 1000 projects.
Scaling Considerations (Post-POC)
Parallelism
- POC: 2 VMs at a time
- Scale: batch in groups of 10-20 VMs to manage cost and API rate limits
- Use Azure VMSS (Virtual Machine Scale Sets) for easier provisioning at scale
Cost Control
- Auto-destroy VMs after 24 hours regardless of status
- Estimated per-project cost: ~$2-5 Azure compute + $5-20 Anthropic API = $7-25/project
- At 1000 projects: $7k-25k total
Project Discovery Automation
- GitHub API search + filtering pipeline
- Sumble API for company enrichment (as in existing outbound program)
- Apollo API for contact discovery (as in existing outbound program)
- Auto-populate projects.json with enriched data
Results Quality Gate
- Auto-skip email generation if <3 optimizations found
- Auto-skip if no optimization exceeds 1.5x speedup (not impressive enough)
- Flag projects where Claude failed for manual investigation
Email Pipeline Integration
- Instead of manual send, integrate with Apollo sequences
- Auto-create drafts in Apollo, linked to the contact list
- Still require manual approval before sending
File Inventory
| File | Purpose |
|---|---|
program.md |
This document -- the full program specification |
plan.md |
Original high-level plan (kept for reference) |
orchestrator.py |
Main orchestrator script |
optimization.md |
Template injected as CLAUDE.md on each VM |
bootstrap.sh |
VM setup script (copied to VM and executed) |
projects.json |
Project list with metadata and contacts |
universe_optimize.db |
SQLite state database |
email_templates/*.md |
Email templates with placeholders |
results/<id>/ |
Per-project results directory |
Open Questions
- API key management: How to securely inject the Anthropic API key onto VMs? Options: Azure Key Vault, environment variable via SSH, or baked into VM image. -> Env vars via SSH
- GitHub auth on VMs: Need a GitHub PAT or deploy key on each VM for cloning and pushing. Use a dedicated bot account? -> PAT
- Claude Code model: Use Opus for maximum quality, or Sonnet for cost savings? POC should use Opus; scale run could use Sonnet with Opus fallback for stuck sessions.
- Java/JS optimization: The codeflash-agent plugin is Python-focused today. For Java/JS projects, Claude will need to optimize without the plugin -- just using its own profiling/optimization knowledge. This may produce lower quality results. Should we restrict POC to Python only? Lets restrict POC to python
- Rate limits: Anthropic API rate limits at scale. May need multiple API keys or request a rate limit increase.