codeflash-agent/plugin/references/shared/pr-preparation.md

# PR Preparation

After the experiment loop plateaus, prepare upstream PRs for kept optimizations.

## Workflow

### 1. Inventory

Build a table of kept optimizations → target repos → PR status:

```
| # | Optimization | Target repo | PR status |
|---|-------------|-------------|-----------|
| 1 | description | repo-name   | needs PR  |
| 2 | description | repo-name   | PR #N opened |
```

For each optimization without a PR:
1. **Check upstream** — has the code already been changed on `main`? (`gh api repos/ORG/REPO/contents/PATH --jq '.content' | base64 -d | grep ...`)
2. **Check existing PRs** — is there already a PR covering this area? (`gh pr list --repo ORG/REPO --state all --search "relevant keywords"`)
3. **Decide**: create new PR, fold into existing PR, or skip.

### 2. Folding into existing PRs

When a new optimization targets the same function/file as an existing open PR, fold it in rather than creating a separate PR:

1. Check out the existing PR branch
2. Apply the additional change
3. Commit with a clear message explaining the addition
4. **Re-run the benchmark** — this is critical. The PR's benchmark data must reflect ALL changes in the PR, not just the original ones.
5. Update the PR description with new benchmark results
6. Push

### 3. Create pytest-benchmark test

For each optimization going into a PR, create a permanent pytest-benchmark test that lives in the repo. This is different from the disposable micro-benchmark used during the experiment loop — it's a committed test that lets reviewers reproduce results.

Place tests in the project's benchmark directory (e.g. `tests/benchmarks/` or `benchmarks/`). Pattern:

```python
import pytest

@pytest.fixture
def realistic_input():
    """Create input that matches production data sizes."""
    # Use real-world data volumes, not toy examples
    return ...

def test_benchmark_<function_name>(benchmark, realistic_input):
    benchmark(<target_function>, realistic_input)
```

Key points:
- Use realistic input sizes — small inputs produce misleading profiles
- One test per optimized function
- The test name should match the function being benchmarked
- Commit the benchmark test alongside the optimization code change

### 4. Comparative benchmarks

When a PR accumulates multiple changes, run a **multi-variant benchmark** showing each change's incremental contribution:

```
Variant 1: Baseline (upstream main, no changes)
Variant 2: Original PR changes only
Variant 3: Original + new changes (full PR)
```

This lets reviewers understand what each change contributes independently.

#### Benchmark script pattern

Write a self-contained script that:
- Creates realistic test inputs (correct data sizes and volumes)
- Runs each variant under the domain's profiling tool and parses output
- Supports `--runs N` for repeated measurements and `--report` for chart generation
- Uses `tempfile.TemporaryDirectory()` for all intermediate files

### 5. PR body structure

Use the fill-in-the-blanks templates in `pr-body-templates.md`. Pick the variant matching your domain (CPU or Memory), fill in the placeholders, and remove sections that don't apply.

### 6. PR description updates

When folding changes into an existing PR, update the **entire** PR body — not just append. The PR body should read as a coherent description of everything in the PR. Specifically update:
- Summary bullets to mention all changes
- Benchmark table/chart with fresh numbers covering all changes
- Changelog entry if the PR includes one

Use `gh pr edit NUMBER --repo ORG/REPO --body "$(cat <<'EOF' ... EOF)"` to replace the body.

### 7. Conventions

Each domain agent defines its own branch prefix and PR title prefix. Common rules:

- **Do NOT open PRs yourself** unless the user explicitly asks. Prepare the branch, push it, tell the user it's ready. Do NOT push branches or create PRs as a "next step" — wait for explicit instruction.
- Keep PR changed files minimal — only the actual code change plus the benchmark test, not ad-hoc scripts or images.
- Benchmark reproduce instructions go inline in the PR body `<details>` block (see templates).