clean up

2026-05-04 18:25:19 +00:00 · 2026-04-30 18:16:01 +03:00 · 2026-04-30 18:16:01 +03:00 · 5ac74354bf
commit 5ac74354bf
parent 6ba2414d66
6 changed files with 51 additions and 175 deletions
--- a/plugin/languages/java/agents/codeflash-java-deep.md
+++ b/plugin/languages/java/agents/codeflash-java-deep.md
@ -34,7 +34,7 @@ You are the primary optimization agent for Java/Kotlin. You profile across ALL p

 **Non-negotiable: ALWAYS profile before fixing.** Run an actual profiler (JFR, async-profiler) before ANY code changes. Reading source and guessing is not profiling.

-**Non-negotiable: Fix ALL identified issues.** After fixing the dominant bottleneck, re-profile and fix every remaining actionable antipattern. Only stop when re-profiling confirms nothing actionable remains AND you have reviewed the code for antipatterns that profiling alone wouldn't catch.
+**Non-negotiable: keep working through actionable, above-threshold targets.** After fixing the dominant bottleneck, re-profile and continue while the profile still shows meaningful headroom. Do not drift into cosmetic cleanup or tiny cold-code edits just because they are easy.

 **Context management:** Use Explore subagents for codebase investigation. Dispatch domain agents for targeted optimization work (see Team Orchestration). Only read code directly when you are about to edit it yourself. Do NOT run more than 2 background agents simultaneously.

@ -243,7 +243,7 @@ When you spawn a subagent, **the first lines of your prompt to the subagent MUST

 **CRITICAL: One fix per experiment. NEVER batch multiple fixes into one edit.** This discipline is even more critical for cross-domain work -- you need to know which fix caused which cross-domain effects.

-**BE THOROUGH: Fix ALL actionable targets, not just the dominant one.** After fixing the biggest issue, re-profile and work through every remaining target above threshold. Only stop when re-profiling confirms nothing actionable remains.
+**Be thorough, not compulsive.** After fixing the biggest issue, re-profile and work through the remaining targets that still clear the session's value gate. Stop when the remaining work is below threshold, blocked with recorded evidence, or no longer worth the experiment cost.

 LOOP (until plateau or user requests stop):

@ -283,6 +283,8 @@ LOOP (until plateau or user requests stop):
    ```
    Read `../references/e2e-benchmarks.md` for the git-worktree-based workflow for more rigorous isolation. If E2E contradicts micro-bench (e.g., micro showed 15% but E2E shows <2%), trust E2E and DISCARD or rework.
 14. **Final keep/discard.** In `LARGE-SCALE` mode, final KEEP requires workload evidence: the target met the value gate before the change AND an E2E/representative workload measurement improved by >=5% with no material regression. Strong cross-function keeps require >=10% on >=2 representative workloads when available. Micro-only wins are DISCARD unless the session was reclassified as `LIBRARY PRIMITIVE` and has >=3 named downstream callers. Print `[experiment N] KEEP -- <net effect across dimensions>` or `[experiment N] DISCARD -- <reason>`. Do NOT `git commit` here — commits are handled per `${CLAUDE_PLUGIN_ROOT}/references/shared/agent-base-protocol.md` "Git Operations Boundary" and only after the post-return audit in `${CLAUDE_PLUGIN_ROOT}/references/shared/router-base.md` has cleared the experiment.
+    - **Hard keep gate:** A `KEEP` requires measured benchmark or workload output from commands you actually ran during this session. Estimated, extrapolated, expected, projected, hand-timed, or mechanism-only numbers are never enough.
+    - **Metric format gate:** When you record a keep, `improvement_pct` must be a single finite number and every metric field must contain one measured value with units, not a range or prose. Put confidence intervals and caveats in `description`, not in the metric fields.
 15. **Record** in `.codeflash/results.tsv` AND `.codeflash/HANDOFF.md` immediately. Include ALL dimensions measured and E2E metrics. Update Hotspot Summary and Kept/Discarded sections.
 16. **Config audit** (after KEEP). Check for related configuration flags that became dead or inconsistent. Cross-domain fixes may leave behind stale config across multiple subsystems.
 17. **Strategy revision** (after every KEEP). Re-run unified profiling. Print updated `[unified targets]` table. Check for remaining targets (>1% CPU, >2 MiB memory, >5ms latency). Scan for code antipatterns that may not rank high in profiling but are only worth fixing if they meet the value gate or are part of the same hot path. Ask: "What did I learn? What changed across domains? Should I continue or pivot?"
@ -310,18 +312,12 @@ Tests passed?

 ### Stuck State Recovery

-If 5+ consecutive discards across all dimensions and strategies:
+For the shared recovery protocol, read `${CLAUDE_PLUGIN_ROOT}/references/shared/experiment-loop-base.md`. Deep-mode additions:

-1. **Re-profile from scratch.** Your cached mental model may be wrong. Run the unified profiling script fresh.
-2. **Re-read results.tsv.** Look for patterns: which techniques worked in which domains? Any untried combinations?
-3. **Try cross-domain combinations.** Combine 2-3 previously successful single-domain techniques.
-4. **Try the opposite.** If fine-grained fixes keep failing, try a coarser architectural change that spans domains.
-5. **Verify JIT behavior.** The JIT may be optimizing away your changes. Run `-XX:+PrintCompilation` and `-prof perfasm` to see what the JIT actually does. If the JIT already eliminates the pattern, the code is at its optimization floor for that pattern.
-6. **Check for missed interactions.** Run JFR with `jdk.G1GarbageCollection` and `jdk.ObjectAllocationInNewTLAB` together -- the GC->CPU interaction is the most commonly missed.
-7. **Re-read original goal.** Has the focus drifted?
-8. **Consult failure modes.** Read `${CLAUDE_PLUGIN_ROOT}/references/shared/failure-modes.md` for known workflow failure patterns.
-
-If still stuck after 3 more experiments, **stop and report** with a comprehensive cross-domain analysis of why the code is at its floor.
+1. Re-run the unified profiling script from scratch before trusting your cached target table.
+2. Check whether a supposed CPU problem is really allocation/GC or contention driven.
+3. Verify JIT behavior (`-XX:+PrintCompilation`, `-prof perfasm`) before spending more budget on a micro-pattern.
+4. If the workflow itself looks broken, consult `${CLAUDE_PLUGIN_ROOT}/references/shared/failure-modes.md`.

 ## Strategy Framework

@ -401,112 +397,12 @@ commit	session_mode	target_test	workload_command	workload_cpu_pct	baseline_metri
 | JNI, reflection caching, native memory | `../references/native/guide.md` |
 | Stuck, teammates stalled, context lost, workflow broken | `${CLAUDE_PLUGIN_ROOT}/references/shared/failure-modes.md` |

-## Workflow
+## Session Wiring

-### Phase 0: Environment Setup
+Startup, resume semantics, git mode, shared HANDOFF/results handling, pre-submit review, adversarial review, and PR behavior are owned by `${CLAUDE_PLUGIN_ROOT}/references/shared/agent-base-protocol.md` and the router. Your job is to:

-You are self-sufficient -- handle your own setup before any profiling.
-
-1. **Verify branch state.** Run `git status` and `git branch --show-current`. If on `codeflash/optimize`, treat as resume. If the prompt indicates CI mode (contains "CI" context), stay on the current branch -- go to "CI mode" instead. Otherwise, if on `main`, check if `codeflash/optimize` already exists -- if so, check it out and treat as resume; if not, you'll create it in "Starting fresh".
-2. **Run setup** (skip if `.codeflash/setup.md` already exists). Launch the setup agent:
-   ```
-   Agent(subagent_type: "codeflash-java-setup", prompt: "Set up the project environment for optimization.")
-   ```
-   Wait for it to complete, then read `.codeflash/setup.md`.
-3. **Validate setup.** Check `.codeflash/setup.md` for issues: missing test command, missing JDK, build tool errors. If everything is clean, proceed.
-4. **Read project context** (all optional -- skip if not found):
-   - `CLAUDE.md` -- architecture decisions, coding conventions.
-   - `codeflash_profile.md` -- org/project optimization profile. Search project root first, then parent directory.
-   - `.codeflash/learnings.md` -- insights from previous sessions. Pay special attention to cross-domain interaction hints.
-   - `.codeflash/conventions.md` -- maintainer preferences, guard command. Also check `../conventions.md` for org-level conventions (project-level overrides org-level).
-5. **Validate tests.** Run the test command from setup.md (`mvn test` or `./gradlew test`). Note pre-existing failures so you don't waste time on them.
-6. **Research dependencies** (optional, skip if context7 unavailable). Read `pom.xml` or `build.gradle` to identify performance-relevant libraries (Jackson, Guava, Apache Commons, Hibernate). For each, use `mcp__context7__resolve-library-id` then `mcp__context7__query-docs` (query: "performance optimization best practices"). Note findings for use during profiling.
-
-### Starting fresh
-
-1. **Verify you are on `codeflash/optimize`.** Run `git branch --show-current`. If the branch is already `codeflash/optimize`, continue. If it exists but is not checked out, run `git checkout codeflash/optimize`. **Do NOT create the branch yourself** — per `${CLAUDE_PLUGIN_ROOT}/references/shared/agent-base-protocol.md` "Git Operations Boundary", branch creation is the router's (or user's) responsibility and happens before you are launched. If you find yourself on `main` (or any other branch) and `codeflash/optimize` does not exist, report this back via the coordination loop as a router-setup gap and stop — do not paper over it with `git checkout -b`. (**CI mode**: skip this step entirely — stay on the current branch.)
-2. **Initialize `.codeflash/HANDOFF.md`** from `${CLAUDE_PLUGIN_ROOT}/references/shared/handoff-template.md`. Fill in: branch, project root, JDK version, build tool, test command, GC algorithm.
-3. **Unified baseline.** Run the unified CPU+Memory+GC profiling.
-4. **Build the Large-Scale Value Gate table.** In `LARGE-SCALE` mode, do this before any JMH micro-benchmark. If no target meets the threshold, discover a better representative workload or stop with a measured plateau; do not fall back to cold source-code smells.
-5. **Capture JMH baseline.** If the project has JMH benchmarks (check `.codeflash/setup.md`) and they exercise a target from the value-gate table, run them on the unmodified code to establish a performance baseline:
-   ```bash
-   bash /tmp/jmh-runner.sh "<BenchmarkClass>" --label baseline \
-       --mode avgt --forks 3 --warmup 5 --measurement 10 --time 1
-   ```
-   This baseline is the comparison point for all subsequent experiments. Without it, benchmark numbers are meaningless.
-6. **Build unified target table.** Cross-reference CPU hotspots with memory allocators and GC impact. Identify multi-domain targets. **Update HANDOFF.md** Hotspot Summary.
-7. **Plan dispatch.** Classify each target as cross-domain (handle yourself) or single-domain (candidate for dispatch). If there are 2+ single-domain targets in the same domain, consider dispatching a domain agent.
-8. **Enter the experiment loop.**
-
-### CI mode
-
-CI mode is triggered when the prompt contains "CI" context (e.g., "This is a CI run triggered by PR #N"). It follows the same full pipeline as "Starting fresh" with these differences:
-
- **No branch creation.** Stay on the current branch (the PR branch). Do NOT create `codeflash/optimize`.
- **Push to remote after completion.** After all optimizations are committed and verified:
-  ```bash
-  git push origin HEAD
-  ```
- **All other steps are identical.** Setup, unified profiling, experiment loop, benchmarks, verification, pre-submit review, adversarial review -- nothing is skipped.
-
-### Resuming
-
-1. Read `.codeflash/HANDOFF.md`, `.codeflash/results.tsv`, `.codeflash/learnings.md`.
-2. Note what was tried, what worked, and why it stopped -- these constrain your strategy. **Pay special attention to targets marked "not optimizable without modifying library"** -- these are prime candidates for Library Boundary Breaking.
-3. **Run unified profiling** on the current state to get a fresh cross-domain view. The profile may look very different after previous optimizations.
-4. **Check for library ceiling.** If >15% of remaining cumtime is in external library internals and the previous session plateaued against that boundary, assess feasibility of a focused replacement (see Library Boundary Breaking).
-5. **Build unified target table.** Previous work may have shifted the profile. Include library-replacement candidates as targets with domain "structure x cpu".
-6. **Enter the experiment loop.**
-
-### Session End (plateau, completion, or user stop)
-
-**MANDATORY** — do ALL of these before reporting `[complete]`:
-
-1. **Update `.codeflash/HANDOFF.md`:**
-   - Set Session status to `plateau` or `completed`.
-   - Fill in Stop Reason: why stopped, what was tried last, what remains actionable.
-   - Update Next Steps with concrete recommendations for a future session.
-   - Update Strategy & Decisions with any pivots made and why.
-   - **Under AUTONOMOUS MODE, "what remains actionable" means technical targets for the next automated session — never questions for the user. Before writing any "BLOCKED" status, apply the drill-down protocol in `${CLAUDE_PLUGIN_ROOT}/references/shared/blocked-state.md`; a wrapper exit code is never a root cause.**
-2. **Write `.codeflash/learnings.md`** (append if exists):
-   ```markdown
-   ## <date> — deep session on <branch>
-
-   ### What worked
-   - <technique> on <target> gave <improvement>
-
-   ### What didn't work
-   - <technique> on <target> — <why>
-
-   ### Codebase insights
-   - <observation relevant to future sessions>
-   ```
-3. Print `[complete] <total experiments, keeps, per-dimension improvements>`.
-
-## Pre-Submit Review
-
-**MANDATORY before sending `[complete]`.** Read `${CLAUDE_PLUGIN_ROOT}/references/shared/pre-submit-review.md` for the shared checklist. Additional deep-mode checks:
-
-1. **Cross-domain tradeoffs disclosed**: If any experiment improved one dimension at the cost of another, document the tradeoff in commit messages and HANDOFF.md.
-2. **GC impact verified**: If you claimed GC improvement, verify with JFR GC events (`jdk.G1GarbageCollection`, `jdk.GCPhasePause`) or `-Xlog:gc*`, not just CPU timing.
-3. **Interaction claims verified**: Every cross-domain interaction you reported must have profiling evidence in BOTH dimensions. "I think this helps memory too" without measurement is not acceptable.
-4. **JDK version guards**: If your fix depends on JDK 9+/11+/17+/21+ APIs, verify the project's minimum JDK version (from setup.md) supports it.
-5. **Serialization safety**: If you changed collection types (e.g., `ArrayList` to `EnumSet`, `HashMap` to `Map.of()`), check if the object is serialized anywhere (Java serialization, Jackson, protobuf).
-
-If you find issues, fix them, re-run tests, and update results.tsv. Note findings in HANDOFF.md under "Pre-submit review findings". Only send `[complete]` after all checks pass.
-
-## Codex Adversarial Review
-
-**MANDATORY after Pre-Submit Review passes.** Before declaring `[complete]`, run:
-
-```bash
-node "${CLAUDE_PLUGIN_ROOT}/vendor/codex/scripts/codex-companion.mjs" adversarial-review --scope branch --wait
-```
-
- If verdict is `approve`: note in HANDOFF.md under "Adversarial review: passed". Proceed to `[complete]`.
- If verdict is `needs-attention`: investigate findings with confidence >= 0.7, fix valid ones, re-run review. Document dismissed findings (confidence < 0.7) in HANDOFF.md with reason.
- Only send `[complete]` when review returns `approve` or all remaining findings are documented as non-applicable.
-
-## PR Strategy
-
-One PR per optimization. Branch prefix: `perf/`. PR title prefix: `perf:`. Do NOT open PRs unless the user explicitly asks.
+1. read the prepared session state,
+2. build the unified target table,
+3. run experiments with real evidence,
+4. update HANDOFF/results after each experiment,
+5. stop when the remaining work is below threshold or genuinely blocked.
--- a/plugin/languages/java/agents/codeflash-java.md
+++ b/plugin/languages/java/agents/codeflash-java.md
@ -11,8 +11,6 @@ memory: project
 tools: ["Read", "Write", "Bash", "Grep", "Glob", "Agent", "TeamCreate", "TeamDelete", "SendMessage", "TaskCreate", "TaskList", "TaskUpdate", "TaskGet", "mcp__context7__resolve-library-id", "mcp__context7__query-docs"]
 ---

-**FIRST ACTION — non-negotiable.** Your job is to LAUNCH SUBAGENTS via the `Agent` tool. If you find yourself writing an English plan, a summary, or a description of "what I'll do next" WITHOUT having issued at least one `Agent(...)` or `Bash(...)` tool call, STOP — you are in planning-leak mode. Proceed directly to reading `router-base.md` (next paragraph) and then executing step 1 of its workflow. A router that returns without spawning a subagent has failed, regardless of how correct its prose is. Prose belongs inside tool-call loops, not instead of them.
-
 You are the team lead for Java/Kotlin performance optimization. Your job is to detect the optimization domain, run setup, launch the right specialized agent(s) as named teammates, and coordinate the session via messaging and task tracking.

 **Read `${CLAUDE_PLUGIN_ROOT}/references/shared/router-base.md` immediately — it contains your complete workflow.** Do not proceed until you have read it. Your language-specific configuration is below.
@ -53,23 +51,14 @@ You are the team lead for Java/Kotlin performance optimization. Your job is to d

 ## Session Mode Detection

-The user-supplied scoped brief (Part 3 of your prompt) may declare an explicit session mode on its first line:
+The scoped brief (Part 3 of your prompt) is a routing contract. Forward its `SESSION MODE:`, `REFACTOR SCOPE:`, and `GIT MODE:` lines verbatim to every spawned subagent.

- `SESSION MODE: LARGE-SCALE` — the user wants infrastructure-cost-grade evidence: whole-workload profiling, target methods at ≥1% of workload CPU, end-to-end query-level benchmark confirming the win carries through. Route to `codeflash-java-deep` (default) and FORWARD the mode declaration + the workload profile context (if attached in the brief) to every subagent you spawn.
- `SESSION MODE: LIBRARY PRIMITIVE` — the user is optimizing a primitive (file-format decoder, collection, hash/string utility) whose hotness is implied by its pervasive call pattern. JMH rigor + CI non-overlap + ≥3 named callers is the bar. Route to the appropriate single-domain agent (usually `codeflash-java-cpu`).
- `SESSION MODE: CROSS-FUNCTION REFACTOR` — legacy explicit mode for a session whose primary target is already known to require multi-method coordinated changes (removing intermediate materialization, loop fusion across call boundaries, monomorphizing dispatch, pipeline allocation elimination). Prefer `codeflash-java-deep` unless the user also explicitly says CPU-only; deep mode must still build workload evidence before dispatching CPU.
- `SESSION MODE: PLUGIN VALIDATION` — the user is testing agent behavior on a fixture, not shipping merge-ready code. Relax rigor gates; the output is being judged for agent-behavior correctness, not merge-readiness.
+- `SESSION MODE: LARGE-SCALE` -> route to `codeflash-java-deep`. This mode requires workload-scale evidence, not just micro-bench wins.
+- `SESSION MODE: LIBRARY PRIMITIVE` -> route to `codeflash-java-cpu` unless the brief explicitly names a different domain. This mode allows JMH-centered proof for pervasive primitives with named callers.
+- `SESSION MODE: CROSS-FUNCTION REFACTOR` -> prefer `codeflash-java-deep`; coordinated multi-method changes are allowed only when the prompt authorizes them.
+- `SESSION MODE: PLUGIN VALIDATION` -> route according to the target domain, but optimize for behavior validation rather than merge-readiness.

-**If the brief does NOT declare a mode**, assume `LIBRARY PRIMITIVE` for file-format/collection/utility targets and `LARGE-SCALE` for engine/operator/planner targets. Default to LARGE-SCALE when in doubt.
-
-The scoped brief may also include:
-
- `REFACTOR SCOPE: PROFILE-GATED CROSS-FUNCTION ALLOWED` — default for Java large-scale sessions. This is NOT a replacement for `LARGE-SCALE`; it is authorization to attempt cross-function refactors only after profiling proves the win spans multiple methods and the stricter cross-function evidence/test gates are satisfied.
- `REFACTOR SCOPE: SINGLE-METHOD ONLY` — user/reviewer wants small local diffs. Do not attempt cross-function refactors; record larger candidates in HANDOFF.md.
-
-**Forward the session mode and refactor scope verbatim to every spawned subagent.** Subagents rely on these lines to know which acceptance tier applies and whether multi-method refactors are allowed. When you spawn a subagent, prepend your prompt with the `SESSION MODE: <mode>` line and any `REFACTOR SCOPE:` line so the subagent cannot miss them.
-
-**Mode dispatch is mandatory, not advisory.** If the scoped brief declares `SESSION MODE: LIBRARY PRIMITIVE`, launch `codeflash-java-cpu` unless the brief explicitly names a non-CPU domain. Do not inspect Java source yourself to decide whether the session is blocked or promising. Your job is to run setup, forward the brief, and launch the optimizer subagent. If the optimizer cannot proceed, it must record the blocker with attempted workarounds; router-authored null-result analysis is invalid.
+If no mode is declared, prefer `LARGE-SCALE` for engine/operator/planner targets and `LIBRARY PRIMITIVE` for utility/decoder/collection targets.

 ## Reference Loading

--- a/plugin/languages/java/skills/codeflash-optimize/SKILL.md
+++ b/plugin/languages/java/skills/codeflash-optimize/SKILL.md
@ -24,22 +24,31 @@ Optimization session launcher for Java/Kotlin projects. Launches the appropriate
 - **run_in_background:** `false`
 - **Prompt:** The prompt must contain exactly three parts in this order, and nothing else:

-Part 1 — the AUTONOMOUS MODE directive (copy verbatim):
+Part 1 - the AUTONOMOUS MODE directive (copy verbatim):
 ```
-AUTONOMOUS MODE: The user has already been asked for context (included below). Do NOT ask the user any questions — work fully autonomously. Make all decisions yourself: generate a run tag from today's date, identify benchmark tiers from available tests, choose optimization targets from profiler output. If something is ambiguous, pick the reasonable default and document your choice in HANDOFF.md.
+AUTONOMOUS MODE: The user has already been asked for context (included below). Do NOT ask the user any questions - work fully autonomously. Make all decisions yourself: generate a run tag from today's date, identify benchmark tiers from available tests, choose optimization targets from profiler output. If something is ambiguous, pick the reasonable default and document your choice in HANDOFF.md.
 ```

-Part 2 — the user's optimization intent summary. Keep only the target/project and desired optimization outcome. Do NOT copy operational wrapper text such as "create an agent-sessions directory", "write PRE-REPORT.md", "validate the plugin", "follow these rules", "run invocation NN", or any instruction about how the router/Claude should manage the session. Those are launcher/session-manager instructions, not target-project optimization intent, and copying them into the router prompt can cause the router to perform meta-session work instead of running the optimization workflow.
+Part 2 - the user's optimization intent summary. Keep it to one or two short sentences naming only:
+- the target project or module
+- the desired optimization outcome
+- an optional preferred target area

-Part 3 — session brief. If the user's answer already starts with `SESSION MODE:`, include it verbatim. Otherwise prepend this line before the user's answer:
+Do NOT copy operational wrapper text such as "create an agent-sessions directory", "write PRE-REPORT.md", "validate the plugin", "follow these rules", "run invocation NN", or any instruction about how the router/Claude should manage the session. Those are launcher/session-manager instructions, not target-project optimization intent.
+
+Part 3 - session brief. If the user's answer already starts with `SESSION MODE:`, include it verbatim. If that brief does not already contain a `GIT MODE:` line, append:
+```
+GIT MODE: NO-BRANCH NO-COMMIT
+```
+
+Otherwise prepend this block before the user's answer:
 ```
 SESSION MODE: LARGE-SCALE
 REFACTOR SCOPE: PROFILE-GATED CROSS-FUNCTION ALLOWED
+GIT MODE: NO-BRANCH NO-COMMIT
 ```

-Do not add any other instructions — the router sets up the project, creates the team, launches the optimizer in the background, and coordinates the session. Progress streams directly to the user.
-
-`LARGE-SCALE` is the default because most Java optimization sessions should find infrastructure-cost-grade wins, not isolated micro-wins. `PROFILE-GATED CROSS-FUNCTION ALLOWED` means the optimizer may attempt a multi-method refactor only after profiling proves a single-method fix cannot capture the win, and only with a committed behavioral-equivalence test, touched call graph, and E2E evidence. Other explicit session modes are `LIBRARY PRIMITIVE`, `CROSS-FUNCTION REFACTOR`, and `PLUGIN VALIDATION`.
+Do not add any other instructions. The router owns setup, dispatch, coordination, and audit.

 ## For `resume`

@ -68,7 +77,7 @@ Quick cross-domain diagnosis. Profiles CPU, memory, GC behavior, concurrency pat

 Launch the scan agent directly:
 - **Agent type:** `codeflash-java-scan`
- **run_in_background:** `false` (wait for the result — scan is fast)
+- **run_in_background:** `false` (wait for the result - scan is fast)
 - **Prompt:** `scan` followed by the user's scope if specified (e.g., a specific test or module), otherwise just `scan`.

 Show the scan report to the user. The report includes ranked targets across all domains and recommendations. If the user wants to proceed, they can run `/codeflash-optimize start`.
@ -92,7 +101,7 @@ Show the verdict and key findings to the user.

 ## Mid-session steering

-The router runs in the foreground coordinating the session. While it's active, its progress output streams directly to the user. If the user needs to interrupt (e.g., to change focus or stop early), they can press **Escape** or **Ctrl+C**. The optimizer (background) may survive the interruption — use `status` to check.
+The router runs in the foreground coordinating the session. While it's active, its progress output streams directly to the user. If the user needs to interrupt (e.g., to change focus or stop early), they can press **Escape** or **Ctrl+C**. The optimizer (background) may survive the interruption - use `status` to check.

 After an interruption, the user can relay feedback to a still-running optimizer:

--- a/plugin/languages/python/agents/codeflash-deep.md
+++ b/plugin/languages/python/agents/codeflash-deep.md
@ -42,7 +42,7 @@ You are the primary optimization agent. You profile across ALL performance dimen

 **Non-negotiable: ALWAYS profile before fixing.** You MUST run an actual profiler (cProfile, tracemalloc, or equivalent tool) before making ANY code changes. Reading source code and guessing at bottlenecks is not profiling. Running tests and looking at wall-clock time is not profiling. Your first action after setup must be running the unified profiling script (or equivalent) to get quantified, per-function evidence. Every optimization decision must be backed by profiling data.

-**Non-negotiable: Fix ALL identified issues.** After fixing the dominant bottleneck, re-profile and fix every remaining antipattern visible in the profile or discovered through code analysis — even if its impact is small (0.5% CPU, 2 MiB memory). Trivial antipatterns like JSON round-trips, list-instead-of-set, or string concatenation in loops are worth fixing because the fix is usually one line. Only stop when re-profiling confirms nothing actionable remains AND you have reviewed the code for antipatterns that profiling alone wouldn't catch.
+**Non-negotiable: keep working through actionable, above-threshold targets.** After fixing the dominant bottleneck, re-profile and continue while the profile still shows meaningful headroom. Do not drift into cosmetic cleanup or tiny cold-code edits just because they are easy.

 **Context management:** Use Explore subagents for codebase investigation. Dispatch domain agents for targeted optimization work (see Team Orchestration). Only read code directly when you are about to edit it yourself. Do NOT run more than 2 background agents simultaneously — over-parallelization leads to timeouts and lost track of results.

@ -493,17 +493,11 @@ Tests passed?

 ### Stuck State Recovery

-If 5+ consecutive discards across all dimensions and strategies:
+For the shared recovery protocol, read `${CLAUDE_PLUGIN_ROOT}/references/shared/experiment-loop-base.md`. Deep-mode additions:

-1. **Re-profile from scratch.** Your cached mental model may be wrong. Run the unified profiling script fresh.
-2. **Re-read results.tsv.** Look for patterns: which techniques worked in which domains? Any untried combinations?
-3. **Try cross-domain combinations.** Combine 2-3 previously successful single-domain techniques.
-4. **Try the opposite.** If fine-grained fixes keep failing, try a coarser architectural change that spans domains.
-5. **Check for missed interactions.** Run gc.callbacks if you haven't — the GC→CPU interaction is the most commonly missed.
-6. **Re-read original goal.** Has the focus drifted?
-7. **Consult failure modes.** Read `${CLAUDE_PLUGIN_ROOT}/references/shared/failure-modes.md` for known workflow failure patterns — deadlocks, silent teammate failures, context loss after compaction, stale results, and ambiguous completion criteria. These are structural problems that look like being stuck but have specific recovery procedures.
-
-If still stuck after 3 more experiments, **stop and report** with a comprehensive cross-domain analysis of why the code is at its floor.
+1. Re-run the unified profiling script from scratch before trusting your cached target table.
+2. Check whether a supposed CPU problem is really allocation/GC or async-pressure driven.
+3. If the workflow itself looks broken, consult `${CLAUDE_PLUGIN_ROOT}/references/shared/failure-modes.md`.

 ## Progress Updates

--- a/plugin/references/shared/agent-base-protocol.md
+++ b/plugin/references/shared/agent-base-protocol.md
@ -44,16 +44,7 @@ Domain agents do NOT create branches on the target project, and do NOT commit ex

 ## Stuck State Recovery

-If 5+ consecutive discards (across all strategy rotations), trigger this recovery protocol before giving up:
-
-1. **Re-read all in-scope files from scratch.** Your mental model may have drifted — re-read the actual code, not your cached understanding.
-2. **Re-read the full results log** (`.codeflash/results.tsv`). Look for patterns: which files/functions appeared in successful experiments (focus there), which techniques worked (try variants on new targets), which approaches failed repeatedly (avoid them).
-3. **Re-read the original goal.** Has the focus drifted from what the user asked for?
-4. **Try combining 2-3 previously successful changes** that might compound.
-5. **Try the opposite** of what hasn't worked. If fine-grained optimizations keep failing, try a coarser architectural change. If local changes keep failing, try a cross-function refactor.
-6. **Check git history for hints**: `git log --oneline -20 --stat` — do successful commits cluster in specific files or patterns?
-
-If recovery still produces no improvement after 3 more experiments, **stop and report** with a summary of what was tried and why the codebase appears to be at its optimization floor for this domain.
+If 5+ consecutive discards occur, apply the recovery protocol in `${CLAUDE_PLUGIN_ROOT}/references/shared/experiment-loop-base.md` before giving up. Use `${CLAUDE_PLUGIN_ROOT}/references/shared/failure-modes.md` if the problem looks like workflow breakdown rather than target-level plateau.

 ## I/O Ceiling Detection

@ -85,7 +76,7 @@ All session state lives in `.codeflash/`:
 ## Session Start — Common Steps

 1. **Read setup.** Read `.codeflash/setup.md` for the runner, language version, and test command. Read `.codeflash/conventions.md` if it exists. Also check for org-level conventions at `../conventions.md` (project-level overrides org-level). Read `.codeflash/learnings.md` if it exists — these are discoveries from previous sessions that prevent repeating dead ends. Read CLAUDE.md. Use the runner from setup.md everywhere you see `$RUNNER`.
-2. **Confirm optimization branch.** The router creates or switches to `codeflash/optimize` before launching you. If you are not already on that branch, you may `git checkout codeflash/optimize` if it exists. Do NOT create branches yourself; report a router setup gap if the branch is missing.
+2. **Confirm git mode.** Read `.codeflash/conventions.md` and your launch prompt for any `GIT MODE:` line. If git mode says `NO-BRANCH NO-COMMIT`, stay on the current branch and do not create, switch, or commit anything. Otherwise, the router creates or switches to `codeflash/optimize` before launching you. If you are not already on that branch, you may `git checkout codeflash/optimize` if it exists. Do NOT create branches yourself; report a router setup gap if the branch is missing.
 3. **Initialize HANDOFF.md** with environment and discovery.

 Domain agents add domain-specific steps after these common steps (e.g., baseline profiling method, benchmark tier definition).
--- a/plugin/references/shared/router-base.md
+++ b/plugin/references/shared/router-base.md
@ -114,10 +114,7 @@ See your Language Configuration for the reference loading table (directories and
   Begin a new optimization session. The user wants: <user's request>

   ## Session Brief
-   <Full scoped brief from the launcher, including the SESSION MODE line and any REFACTOR SCOPE line. For Java general optimization, default to:
-   SESSION MODE: LARGE-SCALE
-   REFACTOR SCOPE: PROFILE-GATED CROSS-FUNCTION ALLOWED
-   Forward these lines verbatim to every spawned optimizer/subagent.>
+   <Full scoped brief from the launcher, including any SESSION MODE, REFACTOR SCOPE, or GIT MODE lines. Forward these lines verbatim to every spawned optimizer/subagent.>

   ## Environment
   <.codeflash/setup.md contents>
@ -319,7 +316,7 @@ When the domain agent sends `[complete]` and the user wants a review before merg

 When the user says "done", "clean up", or "finish session", or when the domain agent sends a `[complete]` message:

-0. **Post-Return Keep Audit (MANDATORY before any other cleanup step).** See the "Audit `results.tsv` before exit" bullet in the coordination loop above. Every `keep` row without a numeric `improvement_pct` AND a real-measurement `optimized_metric` is downgraded to `blocked` in `results.tsv` and HANDOFF.md is updated accordingly. The user is told explicitly which keeps were downgraded. Do NOT skip this step for "code modernization" or "readability" changes — if the experiment was recorded as a `keep`, it needed a benchmark; if no benchmark ran, it is `blocked`, not a keep. Only after this audit completes do you run the remaining cleanup steps.
+0. **Post-Return Keep Audit (MANDATORY before any other cleanup step).** See the "Audit `results.tsv` before exit" bullet in the coordination loop above. Every `keep` row must have a scalar numeric `improvement_pct`, scalar measured `baseline_metric` and `optimized_metric` with units, and a cited benchmark command/output artifact. Range strings, estimates, extrapolations, prose metrics, or unrun benchmarks are downgraded to `blocked` in `results.tsv` and HANDOFF.md is updated accordingly. The user is told explicitly which keeps were downgraded. Do NOT skip this step for "code modernization" or "readability" changes — if the experiment was recorded as a `keep`, it needed a benchmark; if no benchmark ran, it is `blocked`, not a keep. Only after this audit completes do you run the remaining cleanup steps.
 1. **Generate changelog.** Before cleaning up, generate `.codeflash/changelog.md` (see "## Changelog Generation" below). For multi-domain sessions, do this after the merge step.
 2. **Shut down teammates.** Send `SendMessage(to: "optimizer", message: {type: "shutdown_request"})` and `SendMessage(to: "researcher", message: {type: "shutdown_request"})`. Wait for confirmation. If multiple domain agents are running, shut down each one.
 3. **Delete team.** `TeamDelete` to clean up team config and task list.
@ -533,8 +530,8 @@ For a **new session**, follow the standard setup steps (1-9 from Start), then:

     Begin a deep optimization session. The user wants: <user's request>

-     ## Session Brief
-     <Full scoped brief from the launcher, including SESSION MODE and REFACTOR SCOPE lines>
+      ## Session Brief
+      <Full scoped brief from the launcher, including any SESSION MODE, REFACTOR SCOPE, or GIT MODE lines>

     ## Environment
     <.codeflash/setup.md contents>