Commit graph

18 commits

Author SHA1 Message Date
Kevin Turcios
e7fdae0db6 Add blackbox benchmark VM infra
D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning,
CPU-pinned benchmarks, and A/B comparison scripts.
2026-04-29 03:12:19 -05:00
Kevin Turcios
ffadf16147
chore: add standup dashboard with CI audit integration (#36)
Dash app at .codeflash/standups/ for weekly eng meetings. Pulls live PR data across 4 org repos, renders markdown standup notes, integrates CI audit report with corrected billing numbers from real GitHub API data. Deployed to Plotly Cloud.
2026-04-23 18:52:33 -05:00
Kevin Turcios
3ee9c22c8e
fix: resolve all ruff lint errors across repo (#38)
* fix: resolve all ruff lint errors across repo

Auto-fixed 31 errors (unused imports, formatting, simplifications).
Manually fixed 14 remaining:
- EXE001: removed shebangs from non-executable bench scripts
- C417: replaced map(lambda) with generator expression
- C901/PLR0915: extracted _write_and_instrument_tests from generate_ai_tests
- C901/PLR0912: extracted _parse_toml_addopts and _ini_section_name from modify_addopts
- RUF001/RUF002: replaced ambiguous Unicode chars (en dash, multiplication sign)
- FBT002: made boolean params keyword-only in report functions
- E402: moved `import re` to top of file in security reports

* fix: resolve pre-existing mypy errors across packages

- _testgen.py: annotate `generated` as `str` to avoid no-any-return
- _test_runner.py: use str() for TimeoutExpired stdout/stderr (bytes|str),
  remove unused type: ignore on proc.kill()
- _candidate_eval.py: annotate `speedup` as `float` to avoid no-any-return
  from lazy-loaded performance_gain
2026-04-23 10:22:42 -05:00
Kevin Turcios
c492164fbf Add codeflash org CI audit case study and interactive Dash report
Case study in .codeflash/krrt7/codeflash-ai/ci-audit/ with README,
status, and raw data (fork activity, PRs merged).

Interactive Dash report in reports/codeflash-ci-audit/ with two tabs:
Executive Summary (hero metrics, cost impact charts, before/after) and
Full Detail (fork breakdown, findings table, PR inventory, methodology).

Key numbers: 71% fewer workflow runs, ~$12K/yr in Enterprise overage
savings, 200+ forks disabled, 11 PRs merged across 2 repos.
2026-04-23 03:56:04 -05:00
Kevin Turcios
0901db9fee Update coveragepy status after E2E validation session 2026-04-21 21:19:24 -05:00
Kevin Turcios
edfdd231e0 Use attrs fork with deferred inspect import
Point attrs dependency at local fork (KRRT7/attrs perf/defer-inspect-import)
which defers the ~12ms inspect import until first class build. Temporary
override until upstream merges python-attrs/attrs#1547.

Also adds attrs optimization case study data (VM infra, status).
2026-04-21 02:27:50 -05:00
Kevin Turcios
b42417532d Add optimization project scaffolding for plotly/plotly.py 2026-04-16 23:57:06 -05:00
Kevin Turcios
380bd59503 Add iterative-discovery narrative and missing findings across all reports
Weave "optimizations reveal deeper issues" framing into engagement report
executive summary, case study, and optimization README. Add O(N²) text
extraction fix, per-request RSS creep (24→17 MB), and memray profiling
data that were previously undocumented.
2026-04-16 15:02:39 -05:00
Kevin Turcios
20f6c59f05
Lint and format entire repo, not just packages (#23)
Remove .codeflash/ from ruff extend-exclude, add per-file ignores
for .codeflash/, scripts/, evals/, and plugin/ (benchmark/script
patterns like print, eval, magic values). Remove shebangs. Widen
pre-commit hooks to check the full repo.
2026-04-15 03:16:15 -05:00
Kevin Turcios
33faedf427
Add Unstructured report, rewrite statusline, format evals/scripts (#20)
* Add Unstructured engagement report as uv workspace member

Three-tier Plotly Dash app (Executive Brief, Engineering Team, Full
Detail) with data in JSON, theme constants in theme.py, and Dash
production improvements (Google Fonts, clientside callbacks, meta tags).

Also: add .playwright-mcp/ to .gitignore, add reports/* ruff overrides,
remove tracked .codeflash/observability/read-tracker.

* Rewrite statusline to derive context from git state

Detects active area from changed files (reports, packages, plugin,
.codeflash, case-studies, evals), falls back to branch name convention
(perf/*, feat/*, fix/*), shows dirty indicator. Uses whoami for
cross-platform user detection.

* Add pre-push lint rule to commit guidelines

* Exclude .codeflash/ from ruff linting

Benchmark and profiling scripts in .codeflash/ are scratch work, not
package source. Excluding them prevents CI failures from ad-hoc scripts.

* Run ruff format across packages, scripts, evals, and plugin refs

* Fix github-app async test failures in CI

Add asyncio_mode = "auto" to root pytest config so async tests
are detected when running from the repo root via uv run pytest packages/.
2026-04-15 03:06:16 -05:00
Kevin Turcios
7d86202524
Update metaflow README with actual results and PR status (#19)
Replace placeholder text ("No optimizations applied yet", empty PR table) with:
- CAS lz4 compression results (7-18x on realistic ML payloads)
- Upstream PR status (Netflix/metaflow#3090, open)
- Open questions on dependency management and forward compat
- Methodology, remaining targets, and lessons learned
2026-04-14 23:41:55 -05:00
Kevin Turcios
09ba9b44b2
Add typeagent-py case study (#17)
- Add case-studies/microsoft/typeagent/summary.md with results, lessons
  learned (failed vector search experiment, maintainer alignment), and
  takeaways for codeflash
- Update upstream PR statuses: #235 merged, #236 closed (rejected),
  #232 blocked on #230
- Add typeagent to main README results table
2026-04-14 23:25:29 -05:00
Kevin Turcios
6dd3b02168
Restructure typeagent README: separate failed vector search experiment (#16)
Move vector search benchmarks out of main results into a Lessons Learned
section. The 3.7x-14.2x numbers were real but on a non-bottleneck —
maintainer confirmed model API calls and SQL dominate real latency.

Results section now only shows legitimate wins: import time (1.16x),
indexing pipeline (1.14-1.16x), and query batching (2.10-2.62x).
2026-04-14 23:21:53 -05:00
Kevin Turcios
cc29a27289
Migrate .codeflash/ to {teammember}/{org}/{project}/ format (#15)
Add team member dimension to case study paths so multiple contributors
can track optimization data independently. Derives member from
git config user.name in session-start hooks.

- Move all case studies under .codeflash/krrt7/
- Rename pypa/pip → python/pip (org grouping)
- Update session-start hooks, docs, scripts, and references
2026-04-14 23:04:34 -05:00
m-ali-24
044b2f190a
[FEAT] golang agents (#11)
* go base

* missing javascript

---------

Co-authored-by: ali <--global>
2026-04-14 18:55:36 -05:00
Kevin Turcios
043bf45415 Ignore *.lprof and *.prof binary files, update read-tracker 2026-04-14 18:42:38 -05:00
Kevin Turcios
9830b7b4a1 Track .codeflash/ data: unignore observability and add krrt7/odoo case study 2026-04-14 18:40:08 -05:00
Kevin Turcios
3b59d97647 squash 2026-04-13 14:12:17 -05:00