Commit graph

66 commits

Author SHA1 Message Date
Kevin Turcios
a4276d658a Refine engagement report and case study for executive review
- Hero metrics: -89% cost, -52% peak memory, flat scaling, -12.9% latency
- Add lightspeed canvas animation via assets/lightspeed.js for Plotly Cloud
- Add platform-libs CI/CD migration to timeline (Phase 1b) with PR links
- Update next-engagement card with POC branch and PR references
- Replace RSS with peak memory in user-facing copy
- Add flat memory scaling to case study results table
2026-04-16 17:51:54 -05:00
Kevin Turcios
380bd59503 Add iterative-discovery narrative and missing findings across all reports
Weave "optimizations reveal deeper issues" framing into engagement report
executive summary, case study, and optimization README. Add O(N²) text
extraction fix, per-request RSS creep (24→17 MB), and memray profiling
data that were previously undocumented.
2026-04-16 15:02:39 -05:00
Kevin Turcios
3c705d4e2d Rewrite Unstructured case study for public-facing clarity
Apply research-backed case study structure: headline anchoring on
biggest numbers, customer-as-hero framing, loss aversion, narrative
arc, methodology for developer credibility. Collapse PR inventory
to category summary, ~1,100 words in optimal range.
2026-04-16 14:40:05 -05:00
Kevin Turcios
6d05aea09c Revamp engagement report layout and timeline for executive clarity
- Move Infrastructure Cost Impact above hero metrics and tab toggle
- Extract shared above-fold content into _above_fold_content() for /jpc parity
- Replace plotly Gantt chart with pure-HTML vertical timeline
- Fix cross-browser flex layout (explicit flex: 1 1 0%, minWidth: 0)
- Remove redundant "The Results" and "How This Was Tested" sections
- Rename Engineering Team → Engineering Details
- Rename Peak RSS → Peak Memory Usage
- Update timeline dates: 1-week buffer after Phase 1, cascade phases
- Rename section headers: Vertical Optimization Roadmap, Proposed Next Engagement
2026-04-16 14:31:32 -05:00
Kevin Turcios
aa259b4652 Update uv.lock for security audit app dependencies 2026-04-16 06:19:28 -05:00
Kevin Turcios
3e63326876 Add standalone security audit app for Plotly Cloud deployment
Separate deployment at https://19727fbf-a6a0-45ac-968f-680035ab6b3b.plotly.app
with its own pyproject.toml, lockfile, and plotly-cloud.toml config.
2026-04-16 06:18:33 -05:00
Kevin Turcios
514c1e28c9 Tailor security report for Lawrence, add UX improvements and talking points
- Rewrite executive summary to reference his PR #1465 lockfile fix and
  existing tooling (Renovate, Anchore, Chainguard)
- Reorder findings by category priority (supply chain > container > CI/CD)
  to lead with what matters most to the audience
- Add animated parallelogram background matching codeflash.ai aesthetic
- 6 research-backed UX changes: severity icons (WCAG 1.4.1), title-first
  cards (F-pattern), loss-framed 85% CTA, distinct status colors, card
  opacity for figure-ground separation
- Correct SEC-021 from 67% to 97% mutable Action pins per VM verification
  (only 2 of 96 SHA-pinned in core-product)
- Add talking-points-lawrence.md with profile, pain points, pitch strategy
2026-04-16 06:01:52 -05:00
Kevin Turcios
8c42f27eed Add 4-tab navigation to security audit report
Split the 39-finding wall into tabbed views matching the engagement
report pattern: Summary, Critical & High (21), Medium & Low (18),
and By Category with both category and repository breakdowns.
2026-04-16 05:05:32 -05:00
Kevin Turcios
3dc58775e3 Consolidate report into 4-tab view and clean up for production
- Replace Executive Brief with JPC Summary as default tab (Executive Summary)
- Add Timeline as 4th tab; standalone /jpc and /timeline routes preserved
- Remove dead code: build_exec_view, make_k8s_chart, unused latency vars
- Extract _logo_lockup helper, _TAB_BTN_STYLE constants to reduce duplication
- Use app.layout as function, env-configurable debug/port, update docstring
2026-04-16 04:48:16 -05:00
Kevin Turcios
c22c5babd1 Organize screenshots by date and session
- 2026-04-15: exec restructure, team view, engagements
- 2026-04-16-methodology: methodology notes across all views
- 2026-04-16-jpc: standalone JPC summary and route verification
- 2026-04-16-timeline: timeline iterations (reordering, date fixes, chart tuning)
2026-04-16 03:49:15 -05:00
Kevin Turcios
c3e7dba47b Add report screenshots to reports/unstructured/screenshots/ 2026-04-16 03:48:14 -05:00
Kevin Turcios
b20c05a799 Add /timeline route with proposed engagement roadmap
- Gantt chart with 5 phases: Core-Product (completed), DevEx & CI/CD,
  Platform API, Security Hardening (concurrent with DevEx), Cost Discovery
- Phase detail cards with duration, dates, deliverables, dependencies
- DevEx as Phase 2 (POC already done, sets up faster CI for Phase 3)
- Security runs concurrent with Phase 2 (uv workspace enables lockfile)
- Investment summary with ~5 month total timeline
- Fixed x-axis range and removed rangeslider for clean proportional bars
2026-04-16 03:46:50 -05:00
Kevin Turcios
90091ccc12 Add /jpc standalone summary route and methodology notes
- Add build_jpc_view() with clean standalone layout at /jpc for JPC
  (no tabs, no hero — just the document that "stands on its own")
- Add URL routing via dcc.Location: / serves full report, /jpc serves summary
- Add methodology notes to exec view (How This Was Tested annotations)
- Add methodology notes to detail view (7-entry "why" card)
- Enrich team view Memory + Standalone vs. Cumulative explanations
2026-04-16 03:07:33 -05:00
Kevin Turcios
2da186d4df Apply learnings to team + detail views, remove redundancy
Team view:
- Add Engineering Impact Summary at top (4 metrics: memory, density,
  latency, idle vCPU) with pointer to sections below
- Remove Production Context card (redundant with Impact Summary)
- Trim memory table to only metrics not shown in chart (RSS per
  request, K8s allocation) — chart already shows pre/post/delta
- Fix "10-page scan" → "10-page scanned document" in methodology

Detail view:
- Add intro callout explaining this is the raw data backing the
  other two views
2026-04-16 02:46:01 -05:00
Kevin Turcios
c1b603afc4 Fix technical terminology in exec brief
- "CFS quota" → "1-CPU limit" (CFS is implementation detail, too
  technical for exec audience)
- "jemalloc" → "jemalloc, opt-in for 1-CPU pods" (missed instance)
- "requests 1 CPU / 32 GB RAM resource requests" → "per pod" (double
  "requests" was grammatically broken)
- "10-page scan" → "10-page scanned document" (consistent with
  workload profiles section)
2026-04-16 02:41:17 -05:00
Kevin Turcios
2c3aad4325 Restructure exec view: enablement-first flow for JPC audience
Reorder based on persuasion research (Three-Talk Model, Prospect
Theory, Kotter):

1. "The Engagement" — collaborative shared context (team talk)
2. "What This Enables" — loss-framed enablement: 9.2x pod density,
   41 idle vCPUs now available, -12.9% latency for agentic API
3. "The Results" — before/after proof of execution
4. Infrastructure Cost Impact (anchored on $100K/mo)
5. Workload Profiles + Methodology (credibility)
6. Delivered + Proposed Next Engagements

Key shift: lead with what the work unlocks (feature velocity,
platform capacity, API speed) rather than the technical achievement
(memory reduction). Cost savings is proof of execution, not the
headline.
2026-04-16 02:36:29 -05:00
Kevin Turcios
6143c38d78 Move workload profile explanations into Executive Brief
The 1p/10p/16p benchmark rationale belongs in the exec view — JPC
needs to understand that page count != workload before seeing the
numbers. Added "Benchmark Workload Profiles" section before "How This
Was Tested" with the three profiles and the data punchline (#1505 at
-32.6% on 1 page vs -7.4% on 16 pages).
2026-04-16 02:32:35 -05:00
Kevin Turcios
eeebf6eec2 Add workload profile explanations to latency benchmark table
The 1p/10p/16p column headers weren't self-explanatory. Added a
"Benchmark Workload Profiles" card above the latency table in the
Detail view explaining that each document tests a distinct workload
shape (table-dense, scanned, mixed), not just different page counts.

Also added annotation below the table calling out that #1505 has 4x
the impact on the 1-page doc vs. the 16-page doc — letting the data
demonstrate that per-document cost depends on content, not page count.
2026-04-16 02:27:00 -05:00
Kevin Turcios
ddb4cf8258 Update engagement report: reframe for JPC audience, fix technical inaccuracies
- Reframe Future Engagements → Proposed Next Engagements based on
  Crag meeting: lead with Platform API speed/stability, add
  Infrastructure Cost Discovery ($100K/mo), remove Codeflash product
  pitch
- Add Broader Context callout after cost section (core-product = ~10%
  of total Azure spend)
- Fix Knative terminology throughout: "Knative pods" → "pods with a
  1-CPU resource request" (CFS quota, not Knative config)
- Fix CPU detection description: three-tier logic (cgroup v2 cpu.max →
  sched_getaffinity → os.cpu_count, take minimum)
- Clarify jemalloc is opt-in (MALLOC_IMPL=jemalloc), 1-CPU serial OCR
  only; multi-CPU pods should use glibc default due to ~50 MB/process
  arena overhead
2026-04-16 02:11:27 -05:00
Kevin Turcios
9102d14a00 continue 2026-04-16 02:00:33 -05:00
Kevin Turcios
e65b8a3564 Add security audit report and infrastructure cost analysis
Standalone security report (security_report.py) covering 6 supply chain
and build pipeline findings from the performance engagement. Add infra
cost section to exec view showing $10K → $1.1K/mo projection based on
D48s_v5 node packing at 4 GB vs 32 GB per pod.
2026-04-15 18:22:07 -05:00
Kevin Turcios
f8281a24a0 Update engagement_report.py 2026-04-15 13:27:43 -05:00
Kevin Turcios
49a7d586d4 Update engagement_report.py 2026-04-15 13:26:04 -05:00
Kevin Turcios
87a906e704
Update Unstructured engagement report (#25)
* Update engagement report: add logos, grid theme, scope to core-product

- Add Codeflash x Unstructured logo lockup in hero and footer
- Apply roadmap grid pattern (48px, 5% opacity) and zinc-900 background
- Update cards to rounded-2xl with semi-transparent zinc-900/50 bg
- Remove all platform-libs, CI/CD, and security audit sections
- Remove stacked optimizations PR #1500 from open PRs
- Update data to latest FastAPI endpoint measurements
- Filter PR tables to core-product only

* Add methodology section to team view, fix DataTable type safety

Add benchmark environment, measurement protocol, and production
context cards to the top of the Engineering Team view. Split
TABLE_STYLE into individually typed constants (TABLE_HEADER,
TABLE_CELL, TABLE_DATA, TABLE_DATA_CONDITIONAL, TABLE_WRAP) so
DataTable kwargs pass ty and mypy strict checks.

* Add engagement report screenshot assets

* Add PRs from unstructured, unstructured-inference, unstructured-od-models

Expand report scope beyond core-product: 14 new merged PRs and 2 new
open PRs across 3 additional repos. Update PR counts (24 merged, 5 in
progress), add Repo column to detail view tables, update subtitle and
meta description.

* Make PR numbers clickable links in detail view tables

Use DataTable markdown columns with link_target=_blank so PR numbers
link to their GitHub PRs. Add REPO_BASES mapping for per-repo URL
resolution. Override default purple link color with blue (#60a5fa)
to stay readable on the dark background.

* main

* Add Future Engagements section with notes panels to exec view

Prominent banner heading, four numbered cards (CI/CD, Security, Runtime,
Product Integration) each with a right-hand Notes panel for discussion
points. Refactored _next_card helper to accept optional notes parameter.
2026-04-15 13:11:28 -05:00
Kevin Turcios
7e00007569
Improve deep optimizer: profiling script + failure modes + dist fix (#24)
* Exclude dev docs from plugin dist builds

README.md, ARCHITECTURE.md, and ROADMAP.md are development docs that
shouldn't ship in the assembled plugin distributions.

* Improve deep optimizer: fix profiling script, add failure mode awareness

Profiling script: Accept source root and command as CLI args instead of
hardcoding `src` and requiring manual `# === RUN TARGET HERE ===` edits.
The agent now copies the script from references and runs it with the
project's actual source root and test command.

Failure modes: Wire failure-modes.md into the on-demand reference table
and stuck recovery checklist so the agent consults it when workflows
break (deadlocks, silent failures, context loss, stale results).

* Fix ruff lint errors in unified profiling script

Refactor main() into parse_args(), profile_command(), and
report_results() to fix C901 (complexity) and PLR0915 (too many
statements). Also fix S306 (mktemp → NamedTemporaryFile), PLW1510
(explicit check=False), and add noqa for intentional os.path usage
(PTH112) and subprocess with CLI args (S603).
2026-04-15 04:11:52 -05:00
Kevin Turcios
20f6c59f05
Lint and format entire repo, not just packages (#23)
Remove .codeflash/ from ruff extend-exclude, add per-file ignores
for .codeflash/, scripts/, evals/, and plugin/ (benchmark/script
patterns like print, eval, magic values). Remove shebangs. Widen
pre-commit hooks to check the full repo.
2026-04-15 03:16:15 -05:00
Kevin Turcios
33faedf427
Add Unstructured report, rewrite statusline, format evals/scripts (#20)
* Add Unstructured engagement report as uv workspace member

Three-tier Plotly Dash app (Executive Brief, Engineering Team, Full
Detail) with data in JSON, theme constants in theme.py, and Dash
production improvements (Google Fonts, clientside callbacks, meta tags).

Also: add .playwright-mcp/ to .gitignore, add reports/* ruff overrides,
remove tracked .codeflash/observability/read-tracker.

* Rewrite statusline to derive context from git state

Detects active area from changed files (reports, packages, plugin,
.codeflash, case-studies, evals), falls back to branch name convention
(perf/*, feat/*, fix/*), shows dirty indicator. Uses whoami for
cross-platform user detection.

* Add pre-push lint rule to commit guidelines

* Exclude .codeflash/ from ruff linting

Benchmark and profiling scripts in .codeflash/ are scratch work, not
package source. Excluding them prevents CI failures from ad-hoc scripts.

* Run ruff format across packages, scripts, evals, and plugin refs

* Fix github-app async test failures in CI

Add asyncio_mode = "auto" to root pytest config so async tests
are detected when running from the repo root via uv run pytest packages/.
2026-04-15 03:06:16 -05:00
Kevin Turcios
2caaf6af7c
Fix CI: mypy errors, ruff formatting, switch to prek (#22)
* Fix mypy errors and apply ruff formatting across packages

Fix ast.FunctionDef calls missing type_params for Python 3.12+,
correct type: ignore error codes in _comparator and _plugin, and
run ruff format on all package source and test files.

* Switch CI to prek for lint/typecheck checks

Use j178/prek-action for consistent lint+typecheck (ruff check,
ruff format, interrogate, mypy) matching local pre-commit config.
Keep test as a separate parallel job for test-env support.
2026-04-15 02:52:47 -05:00
Kevin Turcios
a1710f7f92
Adopt shared CI workflow (#21)
Replace packages-ci.yml and github-app-tests.yml with a single
ci.yml that calls the shared ci-python-uv reusable workflow.
Lint, typecheck, and test run as parallel jobs. Version check
stays local (needs fetch-depth: 0 + PR-only conditional).
2026-04-15 02:36:17 -05:00
Kevin Turcios
7d86202524
Update metaflow README with actual results and PR status (#19)
Replace placeholder text ("No optimizations applied yet", empty PR table) with:
- CAS lz4 compression results (7-18x on realistic ML payloads)
- Upstream PR status (Netflix/metaflow#3090, open)
- Open questions on dependency management and forward compat
- Methodology, remaining targets, and lessons learned
2026-04-14 23:41:55 -05:00
Kevin Turcios
1734199e85
Add metaflow and core-product case studies, rename pypa to python (#18)
- Rename case-studies/pypa/ → case-studies/python/ to match .codeflash/ convention
- Add case-studies/netflix/metaflow/summary.md (7-18x lz4 vs gzip)
- Add case-studies/unstructured/core-product/summary.md (14.6% latency, 2.1 GB memory)
- Update main README results table with all five case studies
2026-04-14 23:31:49 -05:00
Kevin Turcios
09ba9b44b2
Add typeagent-py case study (#17)
- Add case-studies/microsoft/typeagent/summary.md with results, lessons
  learned (failed vector search experiment, maintainer alignment), and
  takeaways for codeflash
- Update upstream PR statuses: #235 merged, #236 closed (rejected),
  #232 blocked on #230
- Add typeagent to main README results table
2026-04-14 23:25:29 -05:00
Kevin Turcios
6dd3b02168
Restructure typeagent README: separate failed vector search experiment (#16)
Move vector search benchmarks out of main results into a Lessons Learned
section. The 3.7x-14.2x numbers were real but on a non-bottleneck —
maintainer confirmed model API calls and SQL dominate real latency.

Results section now only shows legitimate wins: import time (1.16x),
indexing pipeline (1.14-1.16x), and query batching (2.10-2.62x).
2026-04-14 23:21:53 -05:00
Kevin Turcios
cc29a27289
Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15)
Add team member dimension to case study paths so multiple contributors
can track optimization data independently. Derives member from
git config user.name in session-start hooks.

- Move all case studies under .codeflash/krrt7/
- Rename pypa/pip → python/pip (org grouping)
- Update session-start hooks, docs, scripts, and references
2026-04-14 23:04:34 -05:00
Kevin Turcios
4a65f17bfb
Set up CODEOWNERS for Go and Java language overlays (#14) 2026-04-14 19:18:05 -05:00
Kevin Turcios
361bb899e2
Move Go overlay to plugin/languages/go/ (#13)
* Move Go plugin overlay from languages/go/ to plugin/languages/go/

Aligns Go with the Java/Python/JavaScript convention where all language
overlays live under plugin/languages/<lang>/. The Makefile already
discovers from plugin/languages/* so Go is now included in builds.

* Remove accidental read-tracker changes

* Ignore .codeflash/observability/ in gitignore
2026-04-14 19:14:57 -05:00
m-ali-24
044b2f190a
[FEAT] golang agents (#11)
* go base

* missing javascript

---------

Co-authored-by: ali <--global>
2026-04-14 18:55:36 -05:00
mashraf-222
270cb56cee
Feat/java language support (#12)
* Add Java/Kotlin detection to top-level language router

Adds pom.xml, build.gradle, build.gradle.kts, settings.gradle, and
settings.gradle.kts as markers that route to the codeflash-java router.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add Java/Kotlin agent definitions for all optimization domains

10 agents covering the full optimization pipeline:
- codeflash-java: router/team lead for domain detection
- codeflash-java-setup: environment detection (build tool, JDK, profiling tools)
- codeflash-java-deep: cross-domain optimizer (default)
- codeflash-java-cpu: data structures, algorithms, JIT deopt, JMH benchmarks
- codeflash-java-memory: heap/GC tuning, escape analysis, leak detection
- codeflash-java-async: virtual threads, lock contention, CompletableFuture
- codeflash-java-structure: class loading, JPMS, startup time, circular deps
- codeflash-java-scan: quick cross-domain diagnosis via JFR/jdeps/GC logs
- codeflash-java-ci: GitHub webhook handler for Java PRs
- codeflash-java-pr-prep: JMH benchmarks and PR body templates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add Java domain reference guides for all optimization domains

6 guides covering deep domain knowledge for agent consumption:
- data-structures: collection selection, autoboxing, JIT patterns, sorting
- memory: JVM heap layout, GC algorithms and tuning, escape analysis, leaks
- async: virtual threads, structured concurrency, lock hierarchy, contention
- structure: class loading, JPMS, CDS/AppCDS, ServiceLoader, Spring startup
- database: JPA N+1, HikariCP, pagination, batch operations, EXPLAIN plans
- native: JNI, Panama FFM API, GraalVM native-image, Vector API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add Java optimization skills: session launcher and JFR profiling

- codeflash-optimize: session launcher with start/resume/status/scan/review
- jfr-profiling: quick-action JFR profiling in cpu/alloc/wall modes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Slim Java agents to match Go's concise ~175-line pattern

Move inline code examples, antipattern encyclopedias, JMH templates,
and deep-dive sections from agent prompts into reference guides.
Agents now contain only: target tables, one-liner antipatterns,
reasoning checklists, profiling commands, and keep/discard trees.

Line counts (before → after):
  cpu:       636 → 181
  memory:    878 → 193
  async:     578 → 165
  structure: 532 → 167
  deep:      507 → 186
  scan:      440 → 163
  Average:   595 → 176 (vs Go's 175)

Adds to data-structures/guide.md:
  - Collection contract traps table
  - Reflection → MethodHandle migration pattern
  - JMH benchmark template

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix Makefile build: use rsync merge and portable sed -i

Two bugs in the build target:
1. cp -R created nested dirs (agents/agents/, references/references/)
   instead of merging language overlay into shared base. Fix: rsync -a.
2. sed -i '' is macOS-only; fails silently on Linux. Fix: sed -i.bak
   (works on both macOS and Linux), then delete .bak files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add HANDOFF.md session lifecycle to Java agents

Java agents could read HANDOFF.md on resume but never wrote or
updated it. A session that hit plateau would lose all context —
what was tried, what worked, why it stopped, what to do next.

Changes:
- Deep agent: init HANDOFF.md on fresh start, record after each
  experiment, write Stop Reason + learnings.md on session end
- Domain agents (CPU, memory, async, structure): record to
  HANDOFF.md after each keep/discard, write session-end state
- Handoff template: make language-agnostic (was Python-specific),
  add Session status, Strategy & Decisions, and Stop Reason fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Close 11 gaps between Java and Python plugins

Add missing sections to Java deep agent: experiment loop depth (12 steps),
library boundary breaking, Phase 0 environment setup, CI mode, pre-submit
review, adversarial review, team orchestration, cross-domain results schema,
and structured progress reporting.

Add polymorphic dispatch safety to CPU agent and data-structures guide.
Add diff hygiene to CPU agent. Add native reference to router.

Create two new reference files: library-replacement.md (Guava/Commons/
Jackson/Joda replacement tables) and team-orchestration.md (full dispatch
and merge protocol).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 18:49:41 -05:00
Kevin Turcios
043bf45415 Ignore *.lprof and *.prof binary files, update read-tracker 2026-04-14 18:42:38 -05:00
Kevin Turcios
9830b7b4a1 Track .codeflash/ data: unignore observability and add krrt7/odoo case study 2026-04-14 18:40:08 -05:00
Kevin Turcios
3b59d97647 squash 2026-04-13 14:12:17 -05:00
Kevin Turcios
cee3987d7b cleanup 2026-04-06 05:58:13 -05:00
Kevin Turcios
ebb9658dfd Merge main-teammate branch 2026-04-03 17:36:50 -05:00
Kevin Turcios
0cda0d907c fix: align marketplace version with plugin.json and recursive .DS_Store ignore
- marketplace.json metadata.version 1.0.0 → 0.1.0 to match plugin.json
- .gitignore .DS_Store → **/.DS_Store for nested directories
2026-03-27 11:39:34 -05:00
Kevin Turcios
7fab0082c0
Merge pull request #6 from codeflash-ai/feat/tool-configs
feat: improve skill and eval system
2026-03-27 11:31:30 -05:00
Kevin Turcios
37efa524d7 feat: improve skill, eval system, and tessl config
- Optimize codeflash-optimize SKILL.md (review score 17% → 98%, eval 87% → 100%)
  - Fix frontmatter (allowed-tools format, argument-hint under metadata)
  - Lead description with concrete actions, explicit agent launch parameters
- Add multi-run variance detection to eval system (--runs N flag)
  - score.py aggregate command: min/max/avg/stddev per criterion, flaky detection
  - check-regression.sh defaults to 3 runs for reliable regression detection
- Add per-criterion regression tracking to baseline-scores.json (v3)
  - Reports exactly which criteria regressed, not just total score drops
- Rename evals/ → codeflash-evals/ to avoid tessl directory conflicts
- Switch tessl to managed mode, gitignore vendored tiles and symlinks
2026-03-27 11:30:17 -05:00
Kevin Turcios
999e08fb5e
Merge pull request #5 from codeflash-ai/fix/session-analysis-improvements
fix: session-analysis improvements from 89 real-world sessions
2026-03-27 10:17:44 -05:00
Kevin Turcios
61c393e7ed ci: add actions:read permission for CI status checks
The claude-code-action MCP server requires 'actions: read' to enable
CI status check functionality. Without it, the server is skipped with
a warning.
2026-03-27 10:16:16 -05:00
Kevin Turcios
24ffa83bbf merge: resolve conflicts with main (guard, git history, stuck recovery)
Merge origin/main which added guard commands, git history review step,
stuck state recovery, batched setup questions, and config audit steps.

Resolved 5 conflicts by keeping both:
- Our git-add-specific-files + pre-commit rules applied to the new
  renumbered commit steps (15 instead of 12, etc.)
- Upstream's Record, Config audit, Guard steps preserved
- Router keeps both AUTONOMOUS MODE and batch-questions rules
- Router start steps merged: our branch verification + multi-repo
  detection integrated into upstream's batched-questions flow
2026-03-27 10:15:10 -05:00
Kevin Turcios
ce02fdee29 fix: add .codeflash/ gitignore and session cleanup workflow
- Setup agent now ensures .codeflash/ is in .gitignore before writing
  session state files (prevents accidental commits of profiling artifacts)
- Router agent gets a Cleanup section: preserves learnings.md and
  results.tsv across sessions, deletes transient files (HANDOFF.md,
  setup.md, conventions.md, bench scripts), removes agent-memory dir
2026-03-27 10:09:51 -05:00