codeflash-agent

mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

Author	SHA1	Message	Date
HeshamHM28	45badaf3f0	feat: add Java stop hook to enforce optimization effort (30-attempt cap) Blocks session exit when the LLM hasn't proven its optimization with real JMH benchmarks, hasn't tried enough techniques, or has strategies remaining. Caps at 30 blocks to prevent infinite loops.	2026-04-30 16:59:06 +03:00
HeshamHM28	df5d529882	feat: enforce deep exploration — 10+ attempts per target, iterate past KEEPs The agent was settling for easy wins and quitting targets after 1-2 failed attempts. Now enforces: - Minimum 10 attempts per target before skipping (15+ for high-impact, 20+ for ML) - After a KEEP, try 3+ more approaches to find the maximum (not just first success) - 10-category exploration ladder forces fundamentally different techniques each attempt - Plateau requires 10+ consecutive discards across 5+ categories (was 3 discards) - Exploration mindset section: first principles thinking, novel ideas, step-function improvements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-28 23:32:54 +03:00
HeshamHM28	6274dca2d5	feat: enhance Java optimization flow with extended sessions, Pareto tracking, and MCP visibility Add iterative optimization capabilities inspired by Kimi K2.6: thread topology & spin-wait strategies, allocation profiling, cross-function scope, behavioral equivalence verification, Pareto frontier tracking with chart generation, extended session protocol (10-15+ hours), session interruption detection/recovery via hooks, and MCP endpoint visibility so users can follow the profiling pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-28 16:17:45 +03:00
HeshamHM28	f4101615c2	Enhance Java Experiment Loop Documentation and Benchmarking Guidelines - Added mandatory checks for strategy plans in various experiment loops to ensure proper execution of assigned strategies. - Updated target print statements to include strategy identifiers for better tracking of experiments. - Emphasized the importance of workflow-level JMH comparisons in all experiments, not just after KEEPs, to ensure comprehensive performance evaluation. - Clarified the necessity of comparing original and optimized code in every experiment to inform discard decisions accurately. - Introduced guidelines for creating workflow-level benchmarks to capture full code paths and JIT behavior. - Revised documentation to highlight the authoritative nature of workflow-level benchmarks over micro-benchmarks.	2026-04-27 20:50:20 +03:00
HeshamHM28	d9b8d0d89a	Enhance JMH Benchmarking Process and Documentation - Added a new JMH runner script (`jmh-runner.sh`) for automated benchmarking in Java, including options for baseline capture, GC profiling, and result comparison. - Updated experiment loop documentation to include mandatory baseline performance capture before code changes, emphasizing the importance of capturing performance metrics for accurate comparisons. - Revised micro-benchmarking steps to incorporate the new JMH runner and ensure consistent usage of GC profiling across benchmarks. - Improved end-to-end benchmarking instructions to utilize the JMH runner for authoritative measurements, ensuring results are recorded alongside micro-benchmark results. - Enhanced JSON result parsing in documentation to utilize `jq` for more efficient extraction of benchmark metrics. - Streamlined the experiment loop base documentation to clarify the steps for capturing original outputs and performance baselines, reinforcing the need for correctness verification before proceeding with optimizations.	2026-04-16 16:45:18 +02:00
HeshamHM28	e49e5b499f	Add I/O & Serialization Optimization and Worker Pools guides for Java - Introduced a comprehensive guide on I/O & Serialization Optimization, covering data format choices, serialization libraries, buffer management, and common antipatterns. - Added a detailed guide on Worker Pools and Process Management, focusing on CPU detection in containers, executor pool sizing, batch processing strategies, and lifecycle management.	2026-04-16 16:23:44 +02:00
HeshamHM28	03dace8461	Enhance Java agent documentation with detailed experiment loop additions for async, CPU, memory, and structure optimization, including JMH benchmarks, reasoning checklists, and verification protocols.	2026-04-16 16:11:47 +02:00
HeshamHM28	d7518276d0	Add experiment loop documentation for async, data structures, memory, and structure domains in Java	2026-04-16 15:34:01 +02:00
HeshamHM28	b638fb4570	Add Java-specific E2E and micro-benchmarking documentation and profiling script - Introduced `e2e-benchmarks.md` for Java E2E benchmarking guidelines, including JMH detection, workflows, and fallback strategies. - Created `micro-benchmark.md` detailing JMH micro-benchmarking practices, including benchmark design, execution, and result interpretation. - Added `unified-profiling-script.sh` for comprehensive CPU, memory, and GC profiling using JFR, enhancing profiling capabilities for Java applications.	2026-04-16 15:11:55 +02:00
HeshamHM28	0f117e968a	Merge main and fix lint errors in reports/unstructured-security - Resolve merge conflict in security_report.py (take sorted findings from main) - Fix ruff lint/format issues in unstructured-security/app.py	2026-04-16 14:24:07 +02:00
HeshamHM28	f284c0a3dc	Trigger CI	2026-04-16 14:20:16 +02:00
HeshamHM28	5b97a4e60d	Fix ruff lint and format errors in reports/unstructured - Replace ambiguous unicode characters (en dash, multiplication sign) - Move `import re` to top-level imports in security_report.py - Remove unused imports (DARK, LIGHT_GREEN, PURPLE) - Fix f-strings without placeholders - Rewrite dict() calls as literals - Rename unused loop variable `i` to `_i` - Apply ruff format	2026-04-16 14:14:32 +02:00
HeshamHM28	8c9e46cacc	Enhance .gitignore to exclude .codeflash directory from version control	2026-04-16 14:03:13 +02:00
HeshamHM28	7eca2f5ace	Enhance benchmarking documentation with detailed JMH design guidelines and references	2026-04-16 13:40:11 +02:00
HeshamHM28	37336064cb	Enhance algorithmic optimization reference with detailed patterns and usage guidelines	2026-04-16 13:27:03 +02:00
Kevin Turcios	aa259b4652	Update uv.lock for security audit app dependencies	2026-04-16 06:19:28 -05:00
Kevin Turcios	3e63326876	Add standalone security audit app for Plotly Cloud deployment Separate deployment at https://19727fbf-a6a0-45ac-968f-680035ab6b3b.plotly.app with its own pyproject.toml, lockfile, and plotly-cloud.toml config.	2026-04-16 06:18:33 -05:00
HeshamHM28	aeafcd408e	Enhance data structure optimization guide with expanded decision framework and performance traps	2026-04-16 13:14:32 +02:00
HeshamHM28	07dfc144e8	Enhance memory optimization guide with leak detection strategies and antipatterns	2026-04-16 13:11:10 +02:00
HeshamHM28	718cc3393e	Enhance concurrency guide with thread pool sizing formulas and best practices for executor isolation	2026-04-16 13:06:27 +02:00
HeshamHM28	e8eca2f9f3	Enhance performance with loop optimization patterns and caching strategies	2026-04-16 13:02:57 +02:00
Kevin Turcios	514c1e28c9	Tailor security report for Lawrence, add UX improvements and talking points - Rewrite executive summary to reference his PR #1465 lockfile fix and existing tooling (Renovate, Anchore, Chainguard) - Reorder findings by category priority (supply chain > container > CI/CD) to lead with what matters most to the audience - Add animated parallelogram background matching codeflash.ai aesthetic - 6 research-backed UX changes: severity icons (WCAG 1.4.1), title-first cards (F-pattern), loss-framed 85% CTA, distinct status colors, card opacity for figure-ground separation - Correct SEC-021 from 67% to 97% mutable Action pins per VM verification (only 2 of 96 SHA-pinned in core-product) - Add talking-points-lawrence.md with profile, pain points, pitch strategy	2026-04-16 06:01:52 -05:00
Kevin Turcios	8c42f27eed	Add 4-tab navigation to security audit report Split the 39-finding wall into tabbed views matching the engagement report pattern: Summary, Critical & High (21), Medium & Low (18), and By Category with both category and repository breakdowns.	2026-04-16 05:05:32 -05:00
Kevin Turcios	3dc58775e3	Consolidate report into 4-tab view and clean up for production - Replace Executive Brief with JPC Summary as default tab (Executive Summary) - Add Timeline as 4th tab; standalone /jpc and /timeline routes preserved - Remove dead code: build_exec_view, make_k8s_chart, unused latency vars - Extract _logo_lockup helper, _TAB_BTN_STYLE constants to reduce duplication - Use app.layout as function, env-configurable debug/port, update docstring	2026-04-16 04:48:16 -05:00
Kevin Turcios	c22c5babd1	Organize screenshots by date and session - 2026-04-15: exec restructure, team view, engagements - 2026-04-16-methodology: methodology notes across all views - 2026-04-16-jpc: standalone JPC summary and route verification - 2026-04-16-timeline: timeline iterations (reordering, date fixes, chart tuning)	2026-04-16 03:49:15 -05:00
Kevin Turcios	c3e7dba47b	Add report screenshots to reports/unstructured/screenshots/	2026-04-16 03:48:14 -05:00
Kevin Turcios	b20c05a799	Add /timeline route with proposed engagement roadmap - Gantt chart with 5 phases: Core-Product (completed), DevEx & CI/CD, Platform API, Security Hardening (concurrent with DevEx), Cost Discovery - Phase detail cards with duration, dates, deliverables, dependencies - DevEx as Phase 2 (POC already done, sets up faster CI for Phase 3) - Security runs concurrent with Phase 2 (uv workspace enables lockfile) - Investment summary with ~5 month total timeline - Fixed x-axis range and removed rangeslider for clean proportional bars	2026-04-16 03:46:50 -05:00
Kevin Turcios	90091ccc12	Add /jpc standalone summary route and methodology notes - Add build_jpc_view() with clean standalone layout at /jpc for JPC (no tabs, no hero — just the document that "stands on its own") - Add URL routing via dcc.Location: / serves full report, /jpc serves summary - Add methodology notes to exec view (How This Was Tested annotations) - Add methodology notes to detail view (7-entry "why" card) - Enrich team view Memory + Standalone vs. Cumulative explanations	2026-04-16 03:07:33 -05:00
Kevin Turcios	2da186d4df	Apply learnings to team + detail views, remove redundancy Team view: - Add Engineering Impact Summary at top (4 metrics: memory, density, latency, idle vCPU) with pointer to sections below - Remove Production Context card (redundant with Impact Summary) - Trim memory table to only metrics not shown in chart (RSS per request, K8s allocation) — chart already shows pre/post/delta - Fix "10-page scan" → "10-page scanned document" in methodology Detail view: - Add intro callout explaining this is the raw data backing the other two views	2026-04-16 02:46:01 -05:00
Kevin Turcios	c1b603afc4	Fix technical terminology in exec brief - "CFS quota" → "1-CPU limit" (CFS is implementation detail, too technical for exec audience) - "jemalloc" → "jemalloc, opt-in for 1-CPU pods" (missed instance) - "requests 1 CPU / 32 GB RAM resource requests" → "per pod" (double "requests" was grammatically broken) - "10-page scan" → "10-page scanned document" (consistent with workload profiles section)	2026-04-16 02:41:17 -05:00
Kevin Turcios	2c3aad4325	Restructure exec view: enablement-first flow for JPC audience Reorder based on persuasion research (Three-Talk Model, Prospect Theory, Kotter): 1. "The Engagement" — collaborative shared context (team talk) 2. "What This Enables" — loss-framed enablement: 9.2x pod density, 41 idle vCPUs now available, -12.9% latency for agentic API 3. "The Results" — before/after proof of execution 4. Infrastructure Cost Impact (anchored on $100K/mo) 5. Workload Profiles + Methodology (credibility) 6. Delivered + Proposed Next Engagements Key shift: lead with what the work unlocks (feature velocity, platform capacity, API speed) rather than the technical achievement (memory reduction). Cost savings is proof of execution, not the headline.	2026-04-16 02:36:29 -05:00
Kevin Turcios	6143c38d78	Move workload profile explanations into Executive Brief The 1p/10p/16p benchmark rationale belongs in the exec view — JPC needs to understand that page count != workload before seeing the numbers. Added "Benchmark Workload Profiles" section before "How This Was Tested" with the three profiles and the data punchline (#1505 at -32.6% on 1 page vs -7.4% on 16 pages).	2026-04-16 02:32:35 -05:00
Kevin Turcios	eeebf6eec2	Add workload profile explanations to latency benchmark table The 1p/10p/16p column headers weren't self-explanatory. Added a "Benchmark Workload Profiles" card above the latency table in the Detail view explaining that each document tests a distinct workload shape (table-dense, scanned, mixed), not just different page counts. Also added annotation below the table calling out that #1505 has 4x the impact on the 1-page doc vs. the 16-page doc — letting the data demonstrate that per-document cost depends on content, not page count.	2026-04-16 02:27:00 -05:00
Kevin Turcios	ddb4cf8258	Update engagement report: reframe for JPC audience, fix technical inaccuracies - Reframe Future Engagements → Proposed Next Engagements based on Crag meeting: lead with Platform API speed/stability, add Infrastructure Cost Discovery ($100K/mo), remove Codeflash product pitch - Add Broader Context callout after cost section (core-product = ~10% of total Azure spend) - Fix Knative terminology throughout: "Knative pods" → "pods with a 1-CPU resource request" (CFS quota, not Knative config) - Fix CPU detection description: three-tier logic (cgroup v2 cpu.max → sched_getaffinity → os.cpu_count, take minimum) - Clarify jemalloc is opt-in (MALLOC_IMPL=jemalloc), 1-CPU serial OCR only; multi-CPU pods should use glibc default due to ~50 MB/process arena overhead	2026-04-16 02:11:27 -05:00
Kevin Turcios	9102d14a00	continue	2026-04-16 02:00:33 -05:00
Kevin Turcios	e65b8a3564	Add security audit report and infrastructure cost analysis Standalone security report (security_report.py) covering 6 supply chain and build pipeline findings from the performance engagement. Add infra cost section to exec view showing $10K → $1.1K/mo projection based on D48s_v5 node packing at 4 GB vs 32 GB per pod.	2026-04-15 18:22:07 -05:00
Kevin Turcios	f8281a24a0	Update engagement_report.py	2026-04-15 13:27:43 -05:00
Kevin Turcios	49a7d586d4	Update engagement_report.py	2026-04-15 13:26:04 -05:00
Kevin Turcios	87a906e704	Update Unstructured engagement report (#25 ) * Update engagement report: add logos, grid theme, scope to core-product - Add Codeflash x Unstructured logo lockup in hero and footer - Apply roadmap grid pattern (48px, 5% opacity) and zinc-900 background - Update cards to rounded-2xl with semi-transparent zinc-900/50 bg - Remove all platform-libs, CI/CD, and security audit sections - Remove stacked optimizations PR #1500 from open PRs - Update data to latest FastAPI endpoint measurements - Filter PR tables to core-product only * Add methodology section to team view, fix DataTable type safety Add benchmark environment, measurement protocol, and production context cards to the top of the Engineering Team view. Split TABLE_STYLE into individually typed constants (TABLE_HEADER, TABLE_CELL, TABLE_DATA, TABLE_DATA_CONDITIONAL, TABLE_WRAP) so DataTable kwargs pass ty and mypy strict checks. * Add engagement report screenshot assets * Add PRs from unstructured, unstructured-inference, unstructured-od-models Expand report scope beyond core-product: 14 new merged PRs and 2 new open PRs across 3 additional repos. Update PR counts (24 merged, 5 in progress), add Repo column to detail view tables, update subtitle and meta description. * Make PR numbers clickable links in detail view tables Use DataTable markdown columns with link_target=_blank so PR numbers link to their GitHub PRs. Add REPO_BASES mapping for per-repo URL resolution. Override default purple link color with blue (#60a5fa) to stay readable on the dark background. * main * Add Future Engagements section with notes panels to exec view Prominent banner heading, four numbered cards (CI/CD, Security, Runtime, Product Integration) each with a right-hand Notes panel for discussion points. Refactored _next_card helper to accept optional notes parameter.	2026-04-15 13:11:28 -05:00
Kevin Turcios	7e00007569	Improve deep optimizer: profiling script + failure modes + dist fix (#24 ) * Exclude dev docs from plugin dist builds README.md, ARCHITECTURE.md, and ROADMAP.md are development docs that shouldn't ship in the assembled plugin distributions. * Improve deep optimizer: fix profiling script, add failure mode awareness Profiling script: Accept source root and command as CLI args instead of hardcoding `src` and requiring manual `# === RUN TARGET HERE ===` edits. The agent now copies the script from references and runs it with the project's actual source root and test command. Failure modes: Wire failure-modes.md into the on-demand reference table and stuck recovery checklist so the agent consults it when workflows break (deadlocks, silent failures, context loss, stale results). * Fix ruff lint errors in unified profiling script Refactor main() into parse_args(), profile_command(), and report_results() to fix C901 (complexity) and PLR0915 (too many statements). Also fix S306 (mktemp → NamedTemporaryFile), PLW1510 (explicit check=False), and add noqa for intentional os.path usage (PTH112) and subprocess with CLI args (S603).	2026-04-15 04:11:52 -05:00
Kevin Turcios	20f6c59f05	Lint and format entire repo, not just packages (#23 ) Remove .codeflash/ from ruff extend-exclude, add per-file ignores for .codeflash/, scripts/, evals/, and plugin/ (benchmark/script patterns like print, eval, magic values). Remove shebangs. Widen pre-commit hooks to check the full repo.	2026-04-15 03:16:15 -05:00
Kevin Turcios	33faedf427	Add Unstructured report, rewrite statusline, format evals/scripts (#20 ) * Add Unstructured engagement report as uv workspace member Three-tier Plotly Dash app (Executive Brief, Engineering Team, Full Detail) with data in JSON, theme constants in theme.py, and Dash production improvements (Google Fonts, clientside callbacks, meta tags). Also: add .playwright-mcp/ to .gitignore, add reports/* ruff overrides, remove tracked .codeflash/observability/read-tracker. * Rewrite statusline to derive context from git state Detects active area from changed files (reports, packages, plugin, .codeflash, case-studies, evals), falls back to branch name convention (perf/, feat/, fix/), shows dirty indicator. Uses whoami for cross-platform user detection. Add pre-push lint rule to commit guidelines * Exclude .codeflash/ from ruff linting Benchmark and profiling scripts in .codeflash/ are scratch work, not package source. Excluding them prevents CI failures from ad-hoc scripts. * Run ruff format across packages, scripts, evals, and plugin refs * Fix github-app async test failures in CI Add asyncio_mode = "auto" to root pytest config so async tests are detected when running from the repo root via uv run pytest packages/.	2026-04-15 03:06:16 -05:00
Kevin Turcios	2caaf6af7c	Fix CI: mypy errors, ruff formatting, switch to prek (#22 ) * Fix mypy errors and apply ruff formatting across packages Fix ast.FunctionDef calls missing type_params for Python 3.12+, correct type: ignore error codes in _comparator and _plugin, and run ruff format on all package source and test files. * Switch CI to prek for lint/typecheck checks Use j178/prek-action for consistent lint+typecheck (ruff check, ruff format, interrogate, mypy) matching local pre-commit config. Keep test as a separate parallel job for test-env support.	2026-04-15 02:52:47 -05:00
Kevin Turcios	a1710f7f92	Adopt shared CI workflow (#21 ) Replace packages-ci.yml and github-app-tests.yml with a single ci.yml that calls the shared ci-python-uv reusable workflow. Lint, typecheck, and test run as parallel jobs. Version check stays local (needs fetch-depth: 0 + PR-only conditional).	2026-04-15 02:36:17 -05:00
Kevin Turcios	7d86202524	Update metaflow README with actual results and PR status (#19 ) Replace placeholder text ("No optimizations applied yet", empty PR table) with: - CAS lz4 compression results (7-18x on realistic ML payloads) - Upstream PR status (Netflix/metaflow#3090, open) - Open questions on dependency management and forward compat - Methodology, remaining targets, and lessons learned	2026-04-14 23:41:55 -05:00
Kevin Turcios	1734199e85	Add metaflow and core-product case studies, rename pypa to python (#18 ) - Rename case-studies/pypa/ → case-studies/python/ to match .codeflash/ convention - Add case-studies/netflix/metaflow/summary.md (7-18x lz4 vs gzip) - Add case-studies/unstructured/core-product/summary.md (14.6% latency, 2.1 GB memory) - Update main README results table with all five case studies	2026-04-14 23:31:49 -05:00
Kevin Turcios	09ba9b44b2	Add typeagent-py case study (#17 ) - Add case-studies/microsoft/typeagent/summary.md with results, lessons learned (failed vector search experiment, maintainer alignment), and takeaways for codeflash - Update upstream PR statuses: #235 merged, #236 closed (rejected), #232 blocked on #230 - Add typeagent to main README results table	2026-04-14 23:25:29 -05:00
Kevin Turcios	6dd3b02168	Restructure typeagent README: separate failed vector search experiment (#16 ) Move vector search benchmarks out of main results into a Lessons Learned section. The 3.7x-14.2x numbers were real but on a non-bottleneck — maintainer confirmed model API calls and SQL dominate real latency. Results section now only shows legitimate wins: import time (1.16x), indexing pipeline (1.14-1.16x), and query batching (2.10-2.62x).	2026-04-14 23:21:53 -05:00
Kevin Turcios	cc29a27289	Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15 ) Add team member dimension to case study paths so multiple contributors can track optimization data independently. Derives member from git config user.name in session-start hooks. - Move all case studies under .codeflash/krrt7/ - Rename pypa/pip → python/pip (org grouping) - Update session-start hooks, docs, scripts, and references	2026-04-14 23:04:34 -05:00
Kevin Turcios	4a65f17bfb	Set up CODEOWNERS for Go and Java language overlays (#14 )	2026-04-14 19:18:05 -05:00

1 2

81 commits