- Introduced new API endpoints for downloading individual archived logs and all logs for a repository as a ZIP file. - Enhanced the CSV handling in the application to include new fields for auto-termination and analysis configuration. - Updated the frontend to support new features, including a modal for auto-termination settings and improved UI for log management. - Updated styles for modals and tables to enhance visual consistency and usability." - Added Dumpy security code in the FE with "123456" - added ability to customize and view the repo analysis by the LLM results - adding deployment folder for the whole deployment flow along with the dogs - fixed some permission errors and ssh access by www-data user while deployment - added s3 bucket saving logs based --------- Co-authored-by: Ashraf <ashraf@rapiddata.io> Co-authored-by: Mohamed Ashraf <mohamedashrraf222@gmail.com> Co-authored-by: Kevin Turcios <106575910+KRRT7@users.noreply.github.com> Co-authored-by: Sarthak Agarwal <sarthak.saga@gmail.com> |
||
|---|---|---|
| .. | ||
| deployment | ||
| docs | ||
| scripts | ||
| server | ||
| tools | ||
| .gitignore | ||
| __init__.py | ||
| env.example | ||
| README.md | ||
Optimizer Factory for Codeflash — EC2-backed
What this is
A minimal pipeline to run Codeflash optimizations across many Python repositories using on-demand EC2 instances. Configure a CSV, launch from the UI, stream logs directly from the instance, and approve results in Codeflash Staging.
Prerequisites
- AWS account with permissions for EC2 and IAM
- AWS CLI installed and configured (
aws configurewith an IAM user/role) - GitHub Personal Access Token (classic) with
public_reposcope - Codeflash API key from
app.codeflash.ai - Python 3.10+ and
venv
Project Structure
config/- Repository configuration and analysis resultsrepos.csv- List of repositories to process with their configurationanalysis/- LLM analysis results for each repositoryjobs.json- Maps repository URLs to EC2 instance IDs for job trackinganalysis_jobs.json- Tracks analysis job status and resultscustom_run_settings.json- Configuration schema for custom optimization runs
scripts/- Core optimization and utility scriptsrun_optimization.sh- Main optimization script that forks/clones repos, detects roots, and runs Codeflashdetect_roots.py- Simple heuristics for detecting module and test root directoriesllm_setup_helper.py- LLM-powered setup assistant for fixing repository dependencies (Deprecated - now using Claude Code CLI directly in run_optimization.sh)entrypoint.sh- Container entrypoint for dependency installation and optimization
server/- Web interface and APIapp.py- Flask API serving static UI and EC2 job managementanalyzer.py- Anthropic-powered analyzer to extract per-repository environment configurationstatic/- Plain HTML/CSS/JS UI to manage repositories and jobs
tools/- Local dependencies and utilitiesrequirements.txt- Python dependencies for the server (Flask, boto3, paramiko, anthropic)
env.example- Environment template for EC2 configuration and API keys
Features
Standard Optimization Runs
- Repository Management: Configure and manage multiple repositories through a CSV file
- EC2 Instance Management: Launch and manage EC2 instances for optimization jobs
- Log Streaming: Real-time log streaming from EC2 instances
- S3 Log Archiving: Automatic log archiving to S3 for permanent storage
- Job Monitoring: Track optimization progress and status
- Auto-termination: Configurable automatic instance termination after job completion
Custom Run Functionality
- Dynamic Configuration: Configure Codeflash optimizations with custom parameters through a web interface
- Multiple Optimization Modes:
- Single Function: Optimize individual Python functions
- Trace & Optimize: End-to-end workflow optimization with execution tracing
- Optimize All: Comprehensive codebase optimization
- Flexible Settings: Override default module roots, test configurations, and optimization flags
- Custom pyproject.toml: Specify custom locations for Codeflash configuration files
- Real-time Validation: Validate configurations before execution
- Comprehensive Logging: Detailed logging and monitoring for custom runs
For detailed information about the Custom Run functionality, see docs/CUSTOM_RUN.md.
Setup and Installation
-
Clone the repository:
git clone <repository-url> cd optimization-factory -
Create a virtual environment:
python3 -m venv venv -
Activate the virtual environment:
source venv/bin/activate -
Install dependencies:
pip install -r tools/requirements.txt -
Configure environment:
- Copy
env.exampleto.env - Fill in your actual values for AWS configuration, API keys, and SSH key path
- Copy
Running the Application
From the project root directory, run:
python -m server.app
Then open the web interface at http://localhost:5000.
Step-by-step setup
- Configure AWS
- Copy
env.exampleto.envand fill it (AWS region, EC2 key pair, security group, AMI ID, SSH key path). - Important for WSL users: See SSH Key Configuration section below for proper setup.
- Install dependencies
pip install -r tools/requirements.txt
- Provide tokens locally
- Set env vars
CODEFLASH_API_KEYandGITHUB_TOKENin your shell or.env.
- Ensure networking
- The security group must allow outbound HTTPS and inbound SSH from your IP if you want direct access. The instance will reach GitHub and PyPI over the internet.
- Configure repositories to process
-
Edit
config/repos.csvand add rows:repo_url,module_root,tests_root,resource_tier https://github.com/psf/requests,requests,tests,small https://github.com/pallets/flask,src/flask,tests,medium https://github.com/numpy/numpy,numpy,numpy/tests,large https://github.com/user/small-util,auto,auto,small
-
Columns:
repo_url— upstream repository URLmodule_root,tests_root— path orautoto auto-detectresource_tier—small|medium|large(selects job definition)
- Run jobs
-
Install local deps:
pip install -r tools/requirements.txt
-
Start server:
-
python -m server.app -
Open UI:
http://localhost:5000 -
From the UI you can:
- Add/update/delete repos (edits
config/repos.csv) - Run optimization for a repo or run all (each launches a dedicated EC2 instance)
- Check job status (instance state and exit code)
- View logs (tail of
/var/log/codeflash-optimization.logon the instance) - Analyze a repo via LLM (Anthropic) and apply proposed config to CSV
- Add/update/delete repos (edits
- Monitor and review
- EC2 console: see instances launching/terminating
- UI logs panel: streams the remote log file
- Codeflash Staging: approve optimizations
Retries and tuning
- If a job fails with OOM, change the
resource_tierinconfig/repos.csvto a larger tier and re-run the launcher. - For more automation (e.g., automatic tier escalation), consider adding AWS Step Functions later.
How it works (under the hood)
- The server launches an EC2 instance per job and waits for SSH.
- It uploads
scripts/run_optimization.shandscripts/detect_roots.py, exports env with analyzer hints, and starts the optimization. - The job writes logs to
/var/log/codeflash-optimization.log; the server tails this file. - A background watcher terminates the instance after completion.
Notes
- Ensure the Codeflash GitHub App is installed for your account/org so forks are covered.
- Provide sufficient EC2 instance size; default is
c7i.2xlargebut adjust as needed.
LLM-powered Repo Analysis (optional)
- Purpose: Suggest per-repo configuration (module root, tests root, resource tier) and optional safe setup commands.
- Requirements:
ANTHROPIC_API_KEYset in environment for the serverpip install -r tools/requirements.txt(includesanthropic,jsonschema)
- How it works:
- UI → Analyze (🧠) calls
/api/analyze_repoand shows results once ready - Results are stored as
config/analysis/<org>-<repo>.json - You can selectively apply
module_root,tests_root, andresource_tierto the CSV - On job submit, if analysis exists, BE passes sanitized overrides to the container via env:
SYSTEM_PACKAGES: allowlisted apt packagesPRE_INSTALL_CMDS,INSTALL_CMDS,POST_INSTALL_CMDS: safe, filtered commands joined with&&- Non-secret env vars if provided
scripts/entrypoint.shexecutes these overrides before running the default detection path
- UI → Analyze (🧠) calls
SSH Key Configuration
Critical for WSL Users: If you're running this on Windows Subsystem for Linux (WSL), you must configure your SSH key properly to avoid permission errors.
-
Copy SSH key to WSL filesystem:
# Copy your SSH key from Windows to WSL home directory cp /mnt/c/path/to/your/key.pem ~/.ssh/your_key_name.pem -
Set correct permissions:
# Set restrictive permissions (required by SSH) chmod 600 ~/.ssh/your_key_name.pem -
Update .env file:
# Use WSL path, not Windows path SSH_KEY_PATH=~/.ssh/your_key_name.pem
Why this is necessary: SSH requires strict file permissions (600) for private keys. Windows file permissions don't translate correctly to WSL, causing "Permissions are too open" errors. By copying the key to the WSL filesystem and setting permissions with chmod, you ensure SSH can read the key properly.
Troubleshooting SSH Issues:
- If you get "Permissions are too open" error: Ensure the key is in WSL filesystem (
~/.ssh/) not Windows filesystem (/mnt/c/) - If you get "No such file or directory": Verify the path in
.envmatches the actual key location - If you get "Permission denied": Check that
chmod 600was applied successfully withls -la ~/.ssh/
Security and safety
- Commands from LLM are pared down via allowlist; risky patterns are dropped.
- Only non-secret env vars are passed through; secrets stay in AWS Secrets Manager.
- If analysis is unavailable, the system falls back to current heuristic detection.