codeflash-agent/plugin/references/shared/pr-preparation.md
2026-04-03 17:36:50 -05:00

98 lines
4 KiB
Markdown

# PR Preparation
After the experiment loop plateaus, prepare upstream PRs for kept optimizations.
## Workflow
### 1. Inventory
Build a table of kept optimizations → target repos → PR status:
```
| # | Optimization | Target repo | PR status |
|---|-------------|-------------|-----------|
| 1 | description | repo-name | needs PR |
| 2 | description | repo-name | PR #N opened |
```
For each optimization without a PR:
1. **Check upstream** — has the code already been changed on `main`? (`gh api repos/ORG/REPO/contents/PATH --jq '.content' | base64 -d | grep ...`)
2. **Check existing PRs** — is there already a PR covering this area? (`gh pr list --repo ORG/REPO --state all --search "relevant keywords"`)
3. **Decide**: create new PR, fold into existing PR, or skip.
### 2. Folding into existing PRs
When a new optimization targets the same function/file as an existing open PR, fold it in rather than creating a separate PR:
1. Check out the existing PR branch
2. Apply the additional change
3. Commit with a clear message explaining the addition
4. **Re-run the benchmark** — this is critical. The PR's benchmark data must reflect ALL changes in the PR, not just the original ones.
5. Update the PR description with new benchmark results
6. Push
### 3. Create pytest-benchmark test
For each optimization going into a PR, create a permanent pytest-benchmark test that lives in the repo. This is different from the disposable micro-benchmark used during the experiment loop — it's a committed test that lets reviewers reproduce results.
Place tests in the project's benchmark directory (e.g. `tests/benchmarks/` or `benchmarks/`). Pattern:
```python
import pytest
@pytest.fixture
def realistic_input():
"""Create input that matches production data sizes."""
# Use real-world data volumes, not toy examples
return ...
def test_benchmark_<function_name>(benchmark, realistic_input):
benchmark(<target_function>, realistic_input)
```
Key points:
- Use realistic input sizes — small inputs produce misleading profiles
- One test per optimized function
- The test name should match the function being benchmarked
- Commit the benchmark test alongside the optimization code change
### 4. Comparative benchmarks
When a PR accumulates multiple changes, run a **multi-variant benchmark** showing each change's incremental contribution:
```
Variant 1: Baseline (upstream main, no changes)
Variant 2: Original PR changes only
Variant 3: Original + new changes (full PR)
```
This lets reviewers understand what each change contributes independently.
#### Benchmark script pattern
Write a self-contained script that:
- Creates realistic test inputs (correct data sizes and volumes)
- Runs each variant under the domain's profiling tool and parses output
- Supports `--runs N` for repeated measurements and `--report` for chart generation
- Uses `tempfile.TemporaryDirectory()` for all intermediate files
### 5. PR body structure
Use the fill-in-the-blanks templates in `pr-body-templates.md`. Pick the variant matching your domain (CPU or Memory), fill in the placeholders, and remove sections that don't apply.
### 6. PR description updates
When folding changes into an existing PR, update the **entire** PR body — not just append. The PR body should read as a coherent description of everything in the PR. Specifically update:
- Summary bullets to mention all changes
- Benchmark table/chart with fresh numbers covering all changes
- Changelog entry if the PR includes one
Use `gh pr edit NUMBER --repo ORG/REPO --body "$(cat <<'EOF' ... EOF)"` to replace the body.
### 7. Conventions
Each domain agent defines its own branch prefix and PR title prefix. Common rules:
- **Do NOT open PRs yourself** unless the user explicitly asks. Prepare the branch, push it, tell the user it's ready. Do NOT push branches or create PRs as a "next step" — wait for explicit instruction.
- Keep PR changed files minimal — only the actual code change plus the benchmark test, not ad-hoc scripts or images.
- Benchmark reproduce instructions go inline in the PR body `<details>` block (see templates).