Commit graph

19 commits

Author SHA1 Message Date
Kevin Turcios
0ad5e60523
Add blackbox package: session flight recorder with HTMX dashboard (#39)
* feat(blackbox): add package with models, CLI, and HTMX dashboard

* test(blackbox): add comprehensive test coverage for dashboard

* feat(blackbox): cache session scanning via watcher invalidation

* docs(blackbox): add README and use fastapi[standard] for dev server

* refactor(blackbox): extract presentation logic into formatter classes

* refactor(blackbox): extract classify_error helpers

* feat(blackbox): wire analytics into session detail view

Show token usage, tool breakdowns, and session stats in a
collapsible panel when viewing a session.

* feat(blackbox): add codeflash plugin detection

Detect codeflash agent names, skills, and commands in transcripts.
Surface language, optimization domain, and capability badges in
the analytics panel.

* refactor(blackbox): remove underscore prefixes from internal functions

* chore: add ty python-version to root pyproject.toml

* chore(blackbox): fix lint errors in test files

* style(blackbox): apply ruff formatting to analytics

* feat(blackbox): add Playwright E2E tests for dashboard

Refactor app.py to expose create_app() factory accepting a projects_dir
override, enabling tests to run against fixture data instead of the real
~/.claude/projects/ directory. Routes now read projects_dir from
app.state instead of the module-level constant.

Add 26 Playwright tests across 5 files covering dashboard loading,
session list, session detail with filters and analytics, sidebar
collapse/localStorage persistence, and SSE log streaming. All tests
pass on chromium, firefox, and webkit (78 total).

CI gets a new e2e-blackbox job with a browser matrix strategy running
all three engines in parallel, conditional on blackbox path changes,
with trace upload on failure.

* fix(ci): sync only blackbox package in e2e job

* fix(ci): exclude e2e tests from unit test job

The test job doesn't install Playwright browsers, so e2e tests error
when pytest collects them. Ignore tests/e2e/ directories in the test
job — those are handled by the dedicated e2e-blackbox job.
2026-04-28 19:58:43 -05:00
Kevin Turcios
72a8610fcf Add vulture to dev dependencies for dead code detection 2026-04-23 04:57:32 -05:00
Kevin Turcios
0316d9822a Add testcontainers[postgres] to workspace dev dependencies
The 12 DB integration tests in codeflash-api need testcontainers to spin
up a real PostgreSQL instance via Docker. Was already declared in the
package's own dev deps but missing from the root workspace.
2026-04-23 03:52:07 -05:00
Kevin Turcios
bf8707695f Add codeflash-api to workspace dev dependencies
The package was a workspace member but not listed in the root dev
group, so its tests couldn't import codeflash_api when running
from the monorepo root.
2026-04-23 03:38:30 -05:00
Kevin Turcios
a292698a1d Add pytest-cov to dev dependencies 2026-04-23 00:41:32 -05:00
Kevin Turcios
4b219907fd Implement POST /ai/testgen endpoint with full generation pipeline
Port test generation from Django reference: prompt templates (Jinja2
with model-type-aware formatting), LLM call orchestration with
even/odd model selection, AST-based code validation with regex
fallback, preamble repair, and ellipsis detection. Instrumentation
and postprocessing are deferred — all four response fields return
the same validated code for now.
2026-04-22 22:11:04 -05:00
Kevin Turcios
6abcc8daa3 Add testgen review and repair endpoints
Port /ai/testgen_review and /ai/testgen_repair from Django reference.
Review: parallel LLM calls per test source, auto-flags behavioral
failures, parses JSON verdicts. Repair: Jinja2 prompt templates,
syntax-error retry loop, Python code extraction and validation.

Schemas: TestgenReviewRequest/Response, TestRepairRequest/Response,
CoverageDetails, FunctionVerdict, TestSourceWithFailures.

23 tests covering: coverage context building, verdict parsing,
syntax error detection, endpoint success/error/retry/language paths,
and the model validator for python_version resolution.
2026-04-22 20:35:39 -05:00
Kevin Turcios
1d70d65914 Wire observability recording into LLM client
Add fire-and-forget background task manager (background.py) and
LLM call recording (recording.py). Every LLMClient.call now records
trace_id, model, latency, tokens, cost, and errors via fire-and-forget.
drain() awaits pending tasks on shutdown. Currently logs only —
database persistence deferred until llm_calls table is wired.
2026-04-22 20:30:10 -05:00
Kevin Turcios
6c04324e25 Add optimize endpoint: context, pipeline, router, prompt templates
Faithful port of the Python optimization pipeline from Django aiservice:
- schemas.py: Pydantic request/response models (OptimizeRequest, OptimizeResponse)
- _markdown.py: markdown code block extraction, splitting, grouping
- _context.py: BaseOptimizerContext with Single/Multi variants for
  prompt assembly, LLM response extraction, and postprocessing
- _pipeline.py: parallel LLM orchestration with model distribution
  (GPT-5-mini + Claude Sonnet 4.5), diversity via line profiler toggling
- _router.py: POST /ai/optimize with auth, rate limiting, usage tracking
- 11 prompt templates copied verbatim from Django reference
- LLM client wired into app lifespan
2026-04-21 22:16:22 -05:00
Kevin Turcios
3e62f502e7 Add language layer: CST utils, validator, postprocessing pipeline
Faithful port of Python language utilities from Django aiservice:
- _cst_utils.py: depth tracking, import extraction, definition removal,
  ellipsis detection, expression evaluation, module path helpers
- _validator.py: dual ast+libcst syntax validation, parse-or-none
- _postprocess.py: full optimization postprocessing pipeline including
  dedup, equality check, docstring restoration, comment cleaning,
  forward reference fixing, ellipsis filtering, isort
2026-04-21 22:04:39 -05:00
Kevin Turcios
5c6b82050a Add diff layer: SEARCH/REPLACE and V4A patch application
Faithfully ported from Django aiservice. V4A uses 3-tier fuzzy
context matching (exact/rstrip/strip) with EOF penalties and scope
markers. Per-file lint ignores for ported complexity.
2026-04-21 21:55:28 -05:00
Kevin Turcios
69714f410f Scaffold codeflash-api package with app factory, config, and healthcheck
FastAPI app factory with lifespan, CORS, optional Sentry. Pydantic-settings
config for all env vars. Full directory structure for all 15 endpoints per
the architecture doc. Workspace integration: ruff src paths, isort, pytest
testpaths, per-file ignores. aiohttp for production, httpx for test client.
2026-04-21 21:28:59 -05:00
Kevin Turcios
d25d7bdad4 Point attrs source at GitHub release wheel for portability
Replace local path override with wheel URL from KRRT7/attrs release
so team members and CI get the optimized attrs on uv sync.
2026-04-21 02:35:45 -05:00
Kevin Turcios
edfdd231e0 Use attrs fork with deferred inspect import
Point attrs dependency at local fork (KRRT7/attrs perf/defer-inspect-import)
which defers the ~12ms inspect import until first class build. Temporary
override until upstream merges python-attrs/attrs#1547.

Also adds attrs optimization case study data (VM infra, status).
2026-04-21 02:27:50 -05:00
Kevin Turcios
87a906e704
Update Unstructured engagement report (#25)
* Update engagement report: add logos, grid theme, scope to core-product

- Add Codeflash x Unstructured logo lockup in hero and footer
- Apply roadmap grid pattern (48px, 5% opacity) and zinc-900 background
- Update cards to rounded-2xl with semi-transparent zinc-900/50 bg
- Remove all platform-libs, CI/CD, and security audit sections
- Remove stacked optimizations PR #1500 from open PRs
- Update data to latest FastAPI endpoint measurements
- Filter PR tables to core-product only

* Add methodology section to team view, fix DataTable type safety

Add benchmark environment, measurement protocol, and production
context cards to the top of the Engineering Team view. Split
TABLE_STYLE into individually typed constants (TABLE_HEADER,
TABLE_CELL, TABLE_DATA, TABLE_DATA_CONDITIONAL, TABLE_WRAP) so
DataTable kwargs pass ty and mypy strict checks.

* Add engagement report screenshot assets

* Add PRs from unstructured, unstructured-inference, unstructured-od-models

Expand report scope beyond core-product: 14 new merged PRs and 2 new
open PRs across 3 additional repos. Update PR counts (24 merged, 5 in
progress), add Repo column to detail view tables, update subtitle and
meta description.

* Make PR numbers clickable links in detail view tables

Use DataTable markdown columns with link_target=_blank so PR numbers
link to their GitHub PRs. Add REPO_BASES mapping for per-repo URL
resolution. Override default purple link color with blue (#60a5fa)
to stay readable on the dark background.

* main

* Add Future Engagements section with notes panels to exec view

Prominent banner heading, four numbered cards (CI/CD, Security, Runtime,
Product Integration) each with a right-hand Notes panel for discussion
points. Refactored _next_card helper to accept optional notes parameter.
2026-04-15 13:11:28 -05:00
Kevin Turcios
20f6c59f05
Lint and format entire repo, not just packages (#23)
Remove .codeflash/ from ruff extend-exclude, add per-file ignores
for .codeflash/, scripts/, evals/, and plugin/ (benchmark/script
patterns like print, eval, magic values). Remove shebangs. Widen
pre-commit hooks to check the full repo.
2026-04-15 03:16:15 -05:00
Kevin Turcios
33faedf427
Add Unstructured report, rewrite statusline, format evals/scripts (#20)
* Add Unstructured engagement report as uv workspace member

Three-tier Plotly Dash app (Executive Brief, Engineering Team, Full
Detail) with data in JSON, theme constants in theme.py, and Dash
production improvements (Google Fonts, clientside callbacks, meta tags).

Also: add .playwright-mcp/ to .gitignore, add reports/* ruff overrides,
remove tracked .codeflash/observability/read-tracker.

* Rewrite statusline to derive context from git state

Detects active area from changed files (reports, packages, plugin,
.codeflash, case-studies, evals), falls back to branch name convention
(perf/*, feat/*, fix/*), shows dirty indicator. Uses whoami for
cross-platform user detection.

* Add pre-push lint rule to commit guidelines

* Exclude .codeflash/ from ruff linting

Benchmark and profiling scripts in .codeflash/ are scratch work, not
package source. Excluding them prevents CI failures from ad-hoc scripts.

* Run ruff format across packages, scripts, evals, and plugin refs

* Fix github-app async test failures in CI

Add asyncio_mode = "auto" to root pytest config so async tests
are detected when running from the repo root via uv run pytest packages/.
2026-04-15 03:06:16 -05:00
Kevin Turcios
3b59d97647 squash 2026-04-13 14:12:17 -05:00
Kevin Turcios
ebb9658dfd Merge main-teammate branch 2026-04-03 17:36:50 -05:00