mirror of
https://github.com/codeflash-ai/codeflash-agent.git
synced 2026-05-04 18:25:19 +00:00
Add team member dimension to case study paths so multiple contributors can track optimization data independently. Derives member from git config user.name in session-start hooks. - Move all case studies under .codeflash/krrt7/ - Rename pypa/pip → python/pip (org grouping) - Update session-start hooks, docs, scripts, and references
2.9 KiB
2.9 KiB
Azure VM Setup for Benchmarking
VM Spec
| Setting | Value |
|---|---|
| Name | rich-bench |
| Resource group | RICH-BENCH-RG |
| Region | westus2 |
| Size | Standard_D2s_v5 (2 vCPU, 8 GB RAM, non-burstable) |
| OS | Ubuntu 24.04 LTS |
| Image | Canonical:ubuntu-24_04-lts:server:latest |
Non-burstable is critical — burstable VMs (B-series) have variable CPU performance that makes benchmarks unreliable.
Provisioning
# Create resource group
az group create --name RICH-BENCH-RG --location westus2
# Create VM
az vm create \
--resource-group RICH-BENCH-RG \
--name rich-bench \
--image Canonical:ubuntu-24_04-lts:server:latest \
--size Standard_D2s_v5 \
--admin-username azureuser \
--generate-ssh-keys \
--custom-data cloud-init.yaml
Cloud-init
The full cloud-init is in cloud-init.yaml. It installs:
- System packages:
git,build-essential,curl - uv:
curl -LsSf https://astral.sh/uv/install.sh | sh - Python 3.12 + 3.13:
uv python install 3.12 3.13 - hyperfine: From GitHub releases (latest)
- Rich clone:
git clone https://github.com/Textualize/rich /home/azureuser/rich - Venvs:
.venv(3.12) andvenv313(3.13) with Rich in editable mode - Bench scripts: Copied to
/home/azureuser/bench/
Post-provisioning verification
ssh azureuser@<ip>
# Check tools
python3.12 --version
python3.13 --version
hyperfine --version
# Check Rich
cd ~/rich && git status
~/rich/.venv/bin/python -c "import rich; print(rich.__version__)"
# Run baseline
bash ~/bench/bench_import.sh
# Verify low stddev (should be <2ms for import benchmarks)
Directory layout on VM
/home/azureuser/
├── rich/ # Rich repo clone (editable install)
│ ├── .venv/ # Python 3.12 venv
│ └── ...
├── venv313/ # Python 3.13 venv
├── bench/
│ ├── bench_import.sh # Overall import time
│ ├── bench_module.sh # Per-module imports
│ ├── bench_e2e.sh # A/B branch comparison
│ ├── bench_compare.sh # Generic branch comparison
│ ├── bench_importtime.py # -X importtime parser
│ ├── bench_runtime.py # PR #12 runtime benchmarks
│ ├── bench_runtime2.py # PR #13 runtime benchmarks
│ ├── bench_text.py # Text hot-path benchmarks
│ └── test_all_impls.sh # Multi-version test runner
└── results/ # Benchmark output storage
Why this setup
- Dedicated VM eliminates background process noise from a developer laptop
- Non-burstable gives consistent CPU frequency — no turbo boost variability
- Two Python versions because
typingimportsreon 3.12 but not 3.13, which affects theredeferral benchmarks - hyperfine handles warmup, min-runs, and statistical reporting (mean ± stddev)
- Editable install allows quick branch switching without reinstall overhead