# PR Preparation After the experiment loop plateaus, prepare upstream PRs for kept optimizations. ## Workflow ### 1. Inventory Build a table of kept optimizations → target repos → PR status: ``` | # | Optimization | Target repo | PR status | |---|-------------|-------------|-----------| | 1 | description | repo-name | needs PR | | 2 | description | repo-name | PR #N opened | ``` For each optimization without a PR: 1. **Check upstream** — has the code already been changed on `main`? (`gh api repos/ORG/REPO/contents/PATH --jq '.content' | base64 -d | grep ...`) 2. **Check existing PRs** — is there already a PR covering this area? (`gh pr list --repo ORG/REPO --state all --search "relevant keywords"`) 3. **Decide**: create new PR, fold into existing PR, or skip. ### 2. Folding into existing PRs When a new optimization targets the same function/file as an existing open PR, fold it in rather than creating a separate PR: 1. Check out the existing PR branch 2. Apply the additional change 3. Commit with a clear message explaining the addition 4. **Re-run the benchmark** — this is critical. The PR's benchmark data must reflect ALL changes in the PR, not just the original ones. 5. Update the PR description with new benchmark results 6. Push ### 3. Create pytest-benchmark test For each optimization going into a PR, create a permanent pytest-benchmark test that lives in the repo. This is different from the disposable micro-benchmark used during the experiment loop — it's a committed test that lets reviewers reproduce results. Place tests in the project's benchmark directory (e.g. `tests/benchmarks/` or `benchmarks/`). Pattern: ```python import pytest @pytest.fixture def realistic_input(): """Create input that matches production data sizes.""" # Use real-world data volumes, not toy examples return ... def test_benchmark_(benchmark, realistic_input): benchmark(, realistic_input) ``` Key points: - Use realistic input sizes — small inputs produce misleading profiles - One test per optimized function - The test name should match the function being benchmarked - Commit the benchmark test alongside the optimization code change ### 4. Comparative benchmarks When a PR accumulates multiple changes, run a **multi-variant benchmark** showing each change's incremental contribution: ``` Variant 1: Baseline (upstream main, no changes) Variant 2: Original PR changes only Variant 3: Original + new changes (full PR) ``` This lets reviewers understand what each change contributes independently. #### Benchmark script pattern Write a self-contained script that: - Creates realistic test inputs (correct data sizes and volumes) - Runs each variant under the domain's profiling tool and parses output - Supports `--runs N` for repeated measurements and `--report` for chart generation - Uses `tempfile.TemporaryDirectory()` for all intermediate files ### 5. PR body structure Use the fill-in-the-blanks templates in `pr-body-templates.md`. Pick the variant matching your domain (CPU or Memory), fill in the placeholders, and remove sections that don't apply. ### 6. PR description updates When folding changes into an existing PR, update the **entire** PR body — not just append. The PR body should read as a coherent description of everything in the PR. Specifically update: - Summary bullets to mention all changes - Benchmark table/chart with fresh numbers covering all changes - Changelog entry if the PR includes one Use `gh pr edit NUMBER --repo ORG/REPO --body "$(cat <<'EOF' ... EOF)"` to replace the body. ### 7. Conventions Each domain agent defines its own branch prefix and PR title prefix. Common rules: - **Do NOT open PRs yourself** unless the user explicitly asks. Prepare the branch, push it, tell the user it's ready. Do NOT push branches or create PRs as a "next step" — wait for explicit instruction. - Keep PR changed files minimal — only the actual code change plus the benchmark test, not ad-hoc scripts or images. - Benchmark reproduce instructions go inline in the PR body `
` block (see templates).