Analyzed ~89 Claude Code sessions across 7 unstructured_org projects to identify recurring failures and friction points, then applied fixes: - Fix "ask then die" bug: skill now injects AUTONOMOUS MODE directive so domain agents work without interactive questions that kill the Agent tool - Fix git add -A: all 4 domain agents now stage specific files instead of blindly staging everything (caused accidental commits of scratch files) - Add pre-commit step: agents run pre-commit before every commit to catch linting failures before CI (ruff/undersort failures were recurring) - Add measurement methodology lock: prevents changing profiling flags mid-experiment which created uninterpretable deltas - Add branch state verification to router startup (prevents wrong-branch confusion that wasted multiple sessions) - Add multi-repo detection to router (original work spanned 4 repos) - Add library vs application awareness to memory agent (prevents wasting time on import-time optimizations in library projects) - Add dependency resilience to setup agent (uv run --with isolation warning, private PyPI failure guidance) - Add PR text quality guidelines (sessions showed AI-sounding text that required multiple user corrections) - Add chart generation guidelines to pr-preparation.md - Add context conservation rules (max 2 background tasks, use subagents) - Add cross-session learnings template for .codeflash/learnings.md - All domain agents now read learnings.md at startup
5.7 KiB
PR Preparation
After the experiment loop plateaus, prepare upstream PRs for kept optimizations.
Workflow
1. Inventory
Build a table of kept optimizations → target repos → PR status:
| # | Optimization | Target repo | PR status |
|---|-------------|-------------|-----------|
| 1 | description | repo-name | needs PR |
| 2 | description | repo-name | PR #N opened |
For each optimization without a PR:
- Check upstream — has the code already been changed on
main? (gh api repos/ORG/REPO/contents/PATH --jq '.content' | base64 -d | grep ...) - Check existing PRs — is there already a PR covering this area? (
gh pr list --repo ORG/REPO --state all --search "relevant keywords") - Decide: create new PR, fold into existing PR, or skip.
2. Folding into existing PRs
When a new optimization targets the same function/file as an existing open PR, fold it in rather than creating a separate PR:
- Check out the existing PR branch
- Apply the additional change
- Commit with a clear message explaining the addition
- Re-run the benchmark — this is critical. The PR's benchmark data must reflect ALL changes in the PR, not just the original ones.
- Update the PR description with new benchmark results
- Push
3. Comparative benchmarks
When a PR accumulates multiple changes, run a multi-variant benchmark showing each change's incremental contribution:
Variant 1: Baseline (upstream main, no changes)
Variant 2: Original PR changes only
Variant 3: Original + new changes (full PR)
This lets reviewers understand what each change contributes independently.
Benchmark script pattern
Write a self-contained script that:
- Creates realistic test inputs (correct data sizes and volumes)
- Runs each variant under the domain's profiling tool and parses output
- Supports
--runs Nfor repeated measurements and--reportfor chart generation - Uses
tempfile.TemporaryDirectory()for all intermediate files
4. PR body structure
## Summary
<1-3 bullet points describing what changed and why>
## Details
<Technical explanation: what the code does, why the old version was suboptimal,
how the new version improves it, any safety considerations>
## Benchmark
<Chart image or text table with exact numbers>
<Platform/Python version/tool info>
## Test plan
- [x] Test A — PASSED
- [x] Test B — PASSED (no regression)
### Reproduce
<details>
<summary>Benchmark script</summary>
```python
# Full self-contained benchmark script
```
5. PR description updates
When folding changes into an existing PR, update the entire PR body — not just append. The PR body should read as a coherent description of everything in the PR. Specifically update:
- Summary bullets to mention all changes
- Benchmark table/chart with fresh numbers covering all changes
- Changelog entry if the PR includes one
Use gh pr edit NUMBER --repo ORG/REPO --body "$(cat <<'EOF' ... EOF)" to replace the body.
6. Conventions
Each domain agent defines its own branch prefix and PR title prefix. Common rules:
- Do NOT open PRs yourself unless the user explicitly asks. Prepare the branch, push it, tell the user it's ready. Do NOT push branches or create PRs as a "next step" — wait for explicit instruction.
- Keep PR changed files minimal — only the actual code change, not benchmark scripts or images.
- Benchmark scripts go inline in the PR body
<details>block.
Writing quality
Write PR descriptions like a human engineer, not a summarizer:
- Be specific: "Replaces HuggingFace's RTDetrImageProcessor with torchvision transforms to eliminate 110 MiB of duplicate weight loading" — not "Improves memory efficiency of image processing."
- Lead with the technical mechanism, not the benefit. Reviewers want to know WHAT you did, not that it's "an improvement."
- No generic headings like "Summary", "Overview", "Key Changes" unless the PR template requires them. If the change is simple enough for 2 sentences, use 2 sentences.
- Don't over-explain the problem. Assume the reviewer knows the codebase. Explain WHY your approach works, not what the code does line-by-line.
7. Chart hosting (if available)
If the project has an image hosting setup (e.g., an orphan branch for assets), use it:
# Upload
gh api repos/ORG/REPO/contents/images/{name}.png \
--method PUT \
-f message="add {name} benchmark chart" \
-f content="$(base64 -i /path/to/chart.png)" \
-f branch=assets-branch
# To update an existing image, include the SHA:
SHA=$(gh api repos/ORG/REPO/contents/images/{name}.png -q '.sha' -H "Accept: application/vnd.github.v3+json" --method GET -f ref=assets-branch)
gh api repos/ORG/REPO/contents/images/{name}.png \
--method PUT \
-f message="update {name}" \
-f content="$(base64 -i /path/to/chart.png)" \
-f branch=assets-branch \
-f sha="$SHA"
# Reference in PR body

Otherwise, describe the results in text tables only.
8. Chart generation guidelines
When generating benchmark charts (e.g., with plotly, matplotlib):
- Separate concerns: Use distinct charts for different metrics (throughput vs memory, latency vs RSS). Combined charts are hard to read and require multiple iterations.
- Plain-language axis labels: Use "Peak Memory (MiB)" not "RSS delta". Use "Throughput (req/s)" not "ops".
- Include the baseline: Always show the baseline variant as the first bar/line for comparison.
- Annotate absolute values: Don't just show bars — label each with the actual number.
- Keep it simple: Bar charts for before/after comparisons. Line charts only for scaling tests (varying N). No 3D charts, no unnecessary styling.