Commit graph

20 commits

Author SHA1 Message Date
HeshamHM28
45badaf3f0 feat: add Java stop hook to enforce optimization effort (30-attempt cap)
Blocks session exit when the LLM hasn't proven its optimization with
real JMH benchmarks, hasn't tried enough techniques, or has strategies
remaining. Caps at 30 blocks to prevent infinite loops.
2026-04-30 16:59:06 +03:00
HeshamHM28
df5d529882 feat: enforce deep exploration — 10+ attempts per target, iterate past KEEPs
The agent was settling for easy wins and quitting targets after 1-2 failed
attempts. Now enforces:
- Minimum 10 attempts per target before skipping (15+ for high-impact, 20+ for ML)
- After a KEEP, try 3+ more approaches to find the maximum (not just first success)
- 10-category exploration ladder forces fundamentally different techniques each attempt
- Plateau requires 10+ consecutive discards across 5+ categories (was 3 discards)
- Exploration mindset section: first principles thinking, novel ideas, step-function improvements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 23:32:54 +03:00
HeshamHM28
6274dca2d5 feat: enhance Java optimization flow with extended sessions, Pareto tracking, and MCP visibility
Add iterative optimization capabilities inspired by Kimi K2.6: thread topology & spin-wait
strategies, allocation profiling, cross-function scope, behavioral equivalence verification,
Pareto frontier tracking with chart generation, extended session protocol (10-15+ hours),
session interruption detection/recovery via hooks, and MCP endpoint visibility so users
can follow the profiling pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 16:17:45 +03:00
HeshamHM28
f4101615c2 Enhance Java Experiment Loop Documentation and Benchmarking Guidelines
- Added mandatory checks for strategy plans in various experiment loops to ensure proper execution of assigned strategies.
- Updated target print statements to include strategy identifiers for better tracking of experiments.
- Emphasized the importance of workflow-level JMH comparisons in all experiments, not just after KEEPs, to ensure comprehensive performance evaluation.
- Clarified the necessity of comparing original and optimized code in every experiment to inform discard decisions accurately.
- Introduced guidelines for creating workflow-level benchmarks to capture full code paths and JIT behavior.
- Revised documentation to highlight the authoritative nature of workflow-level benchmarks over micro-benchmarks.
2026-04-27 20:50:20 +03:00
HeshamHM28
d9b8d0d89a Enhance JMH Benchmarking Process and Documentation
- Added a new JMH runner script (`jmh-runner.sh`) for automated benchmarking in Java, including options for baseline capture, GC profiling, and result comparison.
- Updated experiment loop documentation to include mandatory baseline performance capture before code changes, emphasizing the importance of capturing performance metrics for accurate comparisons.
- Revised micro-benchmarking steps to incorporate the new JMH runner and ensure consistent usage of GC profiling across benchmarks.
- Improved end-to-end benchmarking instructions to utilize the JMH runner for authoritative measurements, ensuring results are recorded alongside micro-benchmark results.
- Enhanced JSON result parsing in documentation to utilize `jq` for more efficient extraction of benchmark metrics.
- Streamlined the experiment loop base documentation to clarify the steps for capturing original outputs and performance baselines, reinforcing the need for correctness verification before proceeding with optimizations.
2026-04-16 16:45:18 +02:00
HeshamHM28
e49e5b499f Add I/O & Serialization Optimization and Worker Pools guides for Java
- Introduced a comprehensive guide on I/O & Serialization Optimization, covering data format choices, serialization libraries, buffer management, and common antipatterns.
- Added a detailed guide on Worker Pools and Process Management, focusing on CPU detection in containers, executor pool sizing, batch processing strategies, and lifecycle management.
2026-04-16 16:23:44 +02:00
HeshamHM28
03dace8461 Enhance Java agent documentation with detailed experiment loop additions for async, CPU, memory, and structure optimization, including JMH benchmarks, reasoning checklists, and verification protocols. 2026-04-16 16:11:47 +02:00
HeshamHM28
d7518276d0 Add experiment loop documentation for async, data structures, memory, and structure domains in Java 2026-04-16 15:34:01 +02:00
HeshamHM28
b638fb4570 Add Java-specific E2E and micro-benchmarking documentation and profiling script
- Introduced `e2e-benchmarks.md` for Java E2E benchmarking guidelines, including JMH detection, workflows, and fallback strategies.
- Created `micro-benchmark.md` detailing JMH micro-benchmarking practices, including benchmark design, execution, and result interpretation.
- Added `unified-profiling-script.sh` for comprehensive CPU, memory, and GC profiling using JFR, enhancing profiling capabilities for Java applications.
2026-04-16 15:11:55 +02:00
HeshamHM28
7eca2f5ace Enhance benchmarking documentation with detailed JMH design guidelines and references 2026-04-16 13:40:11 +02:00
HeshamHM28
37336064cb Enhance algorithmic optimization reference with detailed patterns and usage guidelines 2026-04-16 13:27:03 +02:00
HeshamHM28
aeafcd408e Enhance data structure optimization guide with expanded decision framework and performance traps 2026-04-16 13:14:32 +02:00
HeshamHM28
07dfc144e8 Enhance memory optimization guide with leak detection strategies and antipatterns 2026-04-16 13:11:10 +02:00
HeshamHM28
718cc3393e Enhance concurrency guide with thread pool sizing formulas and best practices for executor isolation 2026-04-16 13:06:27 +02:00
HeshamHM28
e8eca2f9f3 Enhance performance with loop optimization patterns and caching strategies 2026-04-16 13:02:57 +02:00
Kevin Turcios
7e00007569
Improve deep optimizer: profiling script + failure modes + dist fix (#24)
* Exclude dev docs from plugin dist builds

README.md, ARCHITECTURE.md, and ROADMAP.md are development docs that
shouldn't ship in the assembled plugin distributions.

* Improve deep optimizer: fix profiling script, add failure mode awareness

Profiling script: Accept source root and command as CLI args instead of
hardcoding `src` and requiring manual `# === RUN TARGET HERE ===` edits.
The agent now copies the script from references and runs it with the
project's actual source root and test command.

Failure modes: Wire failure-modes.md into the on-demand reference table
and stuck recovery checklist so the agent consults it when workflows
break (deadlocks, silent failures, context loss, stale results).

* Fix ruff lint errors in unified profiling script

Refactor main() into parse_args(), profile_command(), and
report_results() to fix C901 (complexity) and PLR0915 (too many
statements). Also fix S306 (mktemp → NamedTemporaryFile), PLW1510
(explicit check=False), and add noqa for intentional os.path usage
(PTH112) and subprocess with CLI args (S603).
2026-04-15 04:11:52 -05:00
Kevin Turcios
33faedf427
Add Unstructured report, rewrite statusline, format evals/scripts (#20)
* Add Unstructured engagement report as uv workspace member

Three-tier Plotly Dash app (Executive Brief, Engineering Team, Full
Detail) with data in JSON, theme constants in theme.py, and Dash
production improvements (Google Fonts, clientside callbacks, meta tags).

Also: add .playwright-mcp/ to .gitignore, add reports/* ruff overrides,
remove tracked .codeflash/observability/read-tracker.

* Rewrite statusline to derive context from git state

Detects active area from changed files (reports, packages, plugin,
.codeflash, case-studies, evals), falls back to branch name convention
(perf/*, feat/*, fix/*), shows dirty indicator. Uses whoami for
cross-platform user detection.

* Add pre-push lint rule to commit guidelines

* Exclude .codeflash/ from ruff linting

Benchmark and profiling scripts in .codeflash/ are scratch work, not
package source. Excluding them prevents CI failures from ad-hoc scripts.

* Run ruff format across packages, scripts, evals, and plugin refs

* Fix github-app async test failures in CI

Add asyncio_mode = "auto" to root pytest config so async tests
are detected when running from the repo root via uv run pytest packages/.
2026-04-15 03:06:16 -05:00
Kevin Turcios
361bb899e2
Move Go overlay to plugin/languages/go/ (#13)
* Move Go plugin overlay from languages/go/ to plugin/languages/go/

Aligns Go with the Java/Python/JavaScript convention where all language
overlays live under plugin/languages/<lang>/. The Makefile already
discovers from plugin/languages/* so Go is now included in builds.

* Remove accidental read-tracker changes

* Ignore .codeflash/observability/ in gitignore
2026-04-14 19:14:57 -05:00
mashraf-222
270cb56cee
Feat/java language support (#12)
* Add Java/Kotlin detection to top-level language router

Adds pom.xml, build.gradle, build.gradle.kts, settings.gradle, and
settings.gradle.kts as markers that route to the codeflash-java router.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add Java/Kotlin agent definitions for all optimization domains

10 agents covering the full optimization pipeline:
- codeflash-java: router/team lead for domain detection
- codeflash-java-setup: environment detection (build tool, JDK, profiling tools)
- codeflash-java-deep: cross-domain optimizer (default)
- codeflash-java-cpu: data structures, algorithms, JIT deopt, JMH benchmarks
- codeflash-java-memory: heap/GC tuning, escape analysis, leak detection
- codeflash-java-async: virtual threads, lock contention, CompletableFuture
- codeflash-java-structure: class loading, JPMS, startup time, circular deps
- codeflash-java-scan: quick cross-domain diagnosis via JFR/jdeps/GC logs
- codeflash-java-ci: GitHub webhook handler for Java PRs
- codeflash-java-pr-prep: JMH benchmarks and PR body templates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add Java domain reference guides for all optimization domains

6 guides covering deep domain knowledge for agent consumption:
- data-structures: collection selection, autoboxing, JIT patterns, sorting
- memory: JVM heap layout, GC algorithms and tuning, escape analysis, leaks
- async: virtual threads, structured concurrency, lock hierarchy, contention
- structure: class loading, JPMS, CDS/AppCDS, ServiceLoader, Spring startup
- database: JPA N+1, HikariCP, pagination, batch operations, EXPLAIN plans
- native: JNI, Panama FFM API, GraalVM native-image, Vector API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add Java optimization skills: session launcher and JFR profiling

- codeflash-optimize: session launcher with start/resume/status/scan/review
- jfr-profiling: quick-action JFR profiling in cpu/alloc/wall modes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Slim Java agents to match Go's concise ~175-line pattern

Move inline code examples, antipattern encyclopedias, JMH templates,
and deep-dive sections from agent prompts into reference guides.
Agents now contain only: target tables, one-liner antipatterns,
reasoning checklists, profiling commands, and keep/discard trees.

Line counts (before → after):
  cpu:       636 → 181
  memory:    878 → 193
  async:     578 → 165
  structure: 532 → 167
  deep:      507 → 186
  scan:      440 → 163
  Average:   595 → 176 (vs Go's 175)

Adds to data-structures/guide.md:
  - Collection contract traps table
  - Reflection → MethodHandle migration pattern
  - JMH benchmark template

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix Makefile build: use rsync merge and portable sed -i

Two bugs in the build target:
1. cp -R created nested dirs (agents/agents/, references/references/)
   instead of merging language overlay into shared base. Fix: rsync -a.
2. sed -i '' is macOS-only; fails silently on Linux. Fix: sed -i.bak
   (works on both macOS and Linux), then delete .bak files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add HANDOFF.md session lifecycle to Java agents

Java agents could read HANDOFF.md on resume but never wrote or
updated it. A session that hit plateau would lose all context —
what was tried, what worked, why it stopped, what to do next.

Changes:
- Deep agent: init HANDOFF.md on fresh start, record after each
  experiment, write Stop Reason + learnings.md on session end
- Domain agents (CPU, memory, async, structure): record to
  HANDOFF.md after each keep/discard, write session-end state
- Handoff template: make language-agnostic (was Python-specific),
  add Session status, Strategy & Decisions, and Stop Reason fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Close 11 gaps between Java and Python plugins

Add missing sections to Java deep agent: experiment loop depth (12 steps),
library boundary breaking, Phase 0 environment setup, CI mode, pre-submit
review, adversarial review, team orchestration, cross-domain results schema,
and structured progress reporting.

Add polymorphic dispatch safety to CPU agent and data-structures guide.
Add diff hygiene to CPU agent. Add native reference to router.

Create two new reference files: library-replacement.md (Guava/Commons/
Jackson/Joda replacement tables) and team-orchestration.md (full dispatch
and merge protocol).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 18:49:41 -05:00
Kevin Turcios
3b59d97647 squash 2026-04-13 14:12:17 -05:00