PR Preparation

After the experiment loop plateaus, prepare upstream PRs for kept optimizations.

Workflow

1. Inventory

Build a table of kept optimizations → target repos → PR status:

| # | Optimization | Target repo | PR status |
|---|-------------|-------------|-----------|
| 1 | description | repo-name   | needs PR  |
| 2 | description | repo-name   | PR #N opened |

For each optimization without a PR:

Check upstream — has the code already been changed on main? (gh api repos/ORG/REPO/contents/PATH --jq '.content' | base64 -d | grep ...)
Check existing PRs — is there already a PR covering this area? (gh pr list --repo ORG/REPO --state all --search "relevant keywords")
Decide: create new PR, fold into existing PR, or skip.

2. Folding into existing PRs

When a new optimization targets the same function/file as an existing open PR, fold it in rather than creating a separate PR:

Check out the existing PR branch
Apply the additional change
Commit with a clear message explaining the addition
Re-run the benchmark — this is critical. The PR's benchmark data must reflect ALL changes in the PR, not just the original ones.
Update the PR description with new benchmark results
Push

3. Create pytest-benchmark test

For each optimization going into a PR, create a permanent pytest-benchmark test that lives in the repo. This is different from the disposable micro-benchmark used during the experiment loop — it's a committed test that lets reviewers reproduce results.

Place tests in the project's benchmark directory (e.g. tests/benchmarks/ or benchmarks/). Pattern:

import pytest

@pytest.fixture
def realistic_input():
    """Create input that matches production data sizes."""
    # Use real-world data volumes, not toy examples
    return ...

def test_benchmark_<function_name>(benchmark, realistic_input):
    benchmark(<target_function>, realistic_input)

Key points:

Use realistic input sizes — small inputs produce misleading profiles
One test per optimized function
The test name should match the function being benchmarked
Commit the benchmark test alongside the optimization code change

4. Comparative benchmarks

When a PR accumulates multiple changes, run a multi-variant benchmark showing each change's incremental contribution:

Variant 1: Baseline (upstream main, no changes)
Variant 2: Original PR changes only
Variant 3: Original + new changes (full PR)

This lets reviewers understand what each change contributes independently.

Benchmark script pattern

Write a self-contained script that:

Creates realistic test inputs (correct data sizes and volumes)
Runs each variant under the domain's profiling tool and parses output
Supports --runs N for repeated measurements and --report for chart generation
Uses tempfile.TemporaryDirectory() for all intermediate files

5. PR body structure

Use the fill-in-the-blanks templates in pr-body-templates.md. Pick the variant matching your domain (CPU or Memory), fill in the placeholders, and remove sections that don't apply.

6. PR description updates

When folding changes into an existing PR, update the entire PR body — not just append. The PR body should read as a coherent description of everything in the PR. Specifically update:

Summary bullets to mention all changes
Benchmark table/chart with fresh numbers covering all changes
Changelog entry if the PR includes one

Use gh pr edit NUMBER --repo ORG/REPO --body "$(cat <<'EOF' ... EOF)" to replace the body.

7. Conventions

Each domain agent defines its own branch prefix and PR title prefix. Common rules:

Do NOT open PRs yourself unless the user explicitly asks. Prepare the branch, push it, tell the user it's ready. Do NOT push branches or create PRs as a "next step" — wait for explicit instruction.
Keep PR changed files minimal — only the actual code change plus the benchmark test, not ad-hoc scripts or images.
Benchmark reproduce instructions go inline in the PR body <details> block (see templates).

4 KiB Raw Permalink Blame History