codeflash-agent

Author	SHA1	Message	Date
Kevin Turcios	0ad5e60523	Add blackbox package: session flight recorder with HTMX dashboard (#39 ) * feat(blackbox): add package with models, CLI, and HTMX dashboard * test(blackbox): add comprehensive test coverage for dashboard * feat(blackbox): cache session scanning via watcher invalidation * docs(blackbox): add README and use fastapi[standard] for dev server * refactor(blackbox): extract presentation logic into formatter classes * refactor(blackbox): extract classify_error helpers * feat(blackbox): wire analytics into session detail view Show token usage, tool breakdowns, and session stats in a collapsible panel when viewing a session. * feat(blackbox): add codeflash plugin detection Detect codeflash agent names, skills, and commands in transcripts. Surface language, optimization domain, and capability badges in the analytics panel. * refactor(blackbox): remove underscore prefixes from internal functions * chore: add ty python-version to root pyproject.toml * chore(blackbox): fix lint errors in test files * style(blackbox): apply ruff formatting to analytics * feat(blackbox): add Playwright E2E tests for dashboard Refactor app.py to expose create_app() factory accepting a projects_dir override, enabling tests to run against fixture data instead of the real ~/.claude/projects/ directory. Routes now read projects_dir from app.state instead of the module-level constant. Add 26 Playwright tests across 5 files covering dashboard loading, session list, session detail with filters and analytics, sidebar collapse/localStorage persistence, and SSE log streaming. All tests pass on chromium, firefox, and webkit (78 total). CI gets a new e2e-blackbox job with a browser matrix strategy running all three engines in parallel, conditional on blackbox path changes, with trace upload on failure. * fix(ci): sync only blackbox package in e2e job * fix(ci): exclude e2e tests from unit test job The test job doesn't install Playwright browsers, so e2e tests error when pytest collects them. Ignore tests/e2e/ directories in the test job — those are handled by the dedicated e2e-blackbox job.	2026-04-28 19:58:43 -05:00
Kevin Turcios	72a8610fcf	Add vulture to dev dependencies for dead code detection	2026-04-23 04:57:32 -05:00
Kevin Turcios	0316d9822a	Add testcontainers[postgres] to workspace dev dependencies The 12 DB integration tests in codeflash-api need testcontainers to spin up a real PostgreSQL instance via Docker. Was already declared in the package's own dev deps but missing from the root workspace.	2026-04-23 03:52:07 -05:00
Kevin Turcios	bf8707695f	Add codeflash-api to workspace dev dependencies The package was a workspace member but not listed in the root dev group, so its tests couldn't import codeflash_api when running from the monorepo root.	2026-04-23 03:38:30 -05:00
Kevin Turcios	a292698a1d	Add pytest-cov to dev dependencies	2026-04-23 00:41:32 -05:00
Kevin Turcios	4b219907fd	Implement POST /ai/testgen endpoint with full generation pipeline Port test generation from Django reference: prompt templates (Jinja2 with model-type-aware formatting), LLM call orchestration with even/odd model selection, AST-based code validation with regex fallback, preamble repair, and ellipsis detection. Instrumentation and postprocessing are deferred — all four response fields return the same validated code for now.	2026-04-22 22:11:04 -05:00
Kevin Turcios	6abcc8daa3	Add testgen review and repair endpoints Port /ai/testgen_review and /ai/testgen_repair from Django reference. Review: parallel LLM calls per test source, auto-flags behavioral failures, parses JSON verdicts. Repair: Jinja2 prompt templates, syntax-error retry loop, Python code extraction and validation. Schemas: TestgenReviewRequest/Response, TestRepairRequest/Response, CoverageDetails, FunctionVerdict, TestSourceWithFailures. 23 tests covering: coverage context building, verdict parsing, syntax error detection, endpoint success/error/retry/language paths, and the model validator for python_version resolution.	2026-04-22 20:35:39 -05:00
Kevin Turcios	1d70d65914	Wire observability recording into LLM client Add fire-and-forget background task manager (background.py) and LLM call recording (recording.py). Every LLMClient.call now records trace_id, model, latency, tokens, cost, and errors via fire-and-forget. drain() awaits pending tasks on shutdown. Currently logs only — database persistence deferred until llm_calls table is wired.	2026-04-22 20:30:10 -05:00
Kevin Turcios	6c04324e25	Add optimize endpoint: context, pipeline, router, prompt templates Faithful port of the Python optimization pipeline from Django aiservice: - schemas.py: Pydantic request/response models (OptimizeRequest, OptimizeResponse) - _markdown.py: markdown code block extraction, splitting, grouping - _context.py: BaseOptimizerContext with Single/Multi variants for prompt assembly, LLM response extraction, and postprocessing - _pipeline.py: parallel LLM orchestration with model distribution (GPT-5-mini + Claude Sonnet 4.5), diversity via line profiler toggling - _router.py: POST /ai/optimize with auth, rate limiting, usage tracking - 11 prompt templates copied verbatim from Django reference - LLM client wired into app lifespan	2026-04-21 22:16:22 -05:00
Kevin Turcios	3e62f502e7	Add language layer: CST utils, validator, postprocessing pipeline Faithful port of Python language utilities from Django aiservice: - _cst_utils.py: depth tracking, import extraction, definition removal, ellipsis detection, expression evaluation, module path helpers - _validator.py: dual ast+libcst syntax validation, parse-or-none - _postprocess.py: full optimization postprocessing pipeline including dedup, equality check, docstring restoration, comment cleaning, forward reference fixing, ellipsis filtering, isort	2026-04-21 22:04:39 -05:00
Kevin Turcios	5c6b82050a	Add diff layer: SEARCH/REPLACE and V4A patch application Faithfully ported from Django aiservice. V4A uses 3-tier fuzzy context matching (exact/rstrip/strip) with EOF penalties and scope markers. Per-file lint ignores for ported complexity.	2026-04-21 21:55:28 -05:00
Kevin Turcios	69714f410f	Scaffold codeflash-api package with app factory, config, and healthcheck FastAPI app factory with lifespan, CORS, optional Sentry. Pydantic-settings config for all env vars. Full directory structure for all 15 endpoints per the architecture doc. Workspace integration: ruff src paths, isort, pytest testpaths, per-file ignores. aiohttp for production, httpx for test client.	2026-04-21 21:28:59 -05:00
Kevin Turcios	d25d7bdad4	Point attrs source at GitHub release wheel for portability Replace local path override with wheel URL from KRRT7/attrs release so team members and CI get the optimized attrs on uv sync.	2026-04-21 02:35:45 -05:00
Kevin Turcios	edfdd231e0	Use attrs fork with deferred inspect import Point attrs dependency at local fork (KRRT7/attrs perf/defer-inspect-import) which defers the ~12ms inspect import until first class build. Temporary override until upstream merges python-attrs/attrs#1547. Also adds attrs optimization case study data (VM infra, status).	2026-04-21 02:27:50 -05:00
Kevin Turcios	87a906e704	Update Unstructured engagement report (#25 ) * Update engagement report: add logos, grid theme, scope to core-product - Add Codeflash x Unstructured logo lockup in hero and footer - Apply roadmap grid pattern (48px, 5% opacity) and zinc-900 background - Update cards to rounded-2xl with semi-transparent zinc-900/50 bg - Remove all platform-libs, CI/CD, and security audit sections - Remove stacked optimizations PR #1500 from open PRs - Update data to latest FastAPI endpoint measurements - Filter PR tables to core-product only * Add methodology section to team view, fix DataTable type safety Add benchmark environment, measurement protocol, and production context cards to the top of the Engineering Team view. Split TABLE_STYLE into individually typed constants (TABLE_HEADER, TABLE_CELL, TABLE_DATA, TABLE_DATA_CONDITIONAL, TABLE_WRAP) so DataTable kwargs pass ty and mypy strict checks. * Add engagement report screenshot assets * Add PRs from unstructured, unstructured-inference, unstructured-od-models Expand report scope beyond core-product: 14 new merged PRs and 2 new open PRs across 3 additional repos. Update PR counts (24 merged, 5 in progress), add Repo column to detail view tables, update subtitle and meta description. * Make PR numbers clickable links in detail view tables Use DataTable markdown columns with link_target=_blank so PR numbers link to their GitHub PRs. Add REPO_BASES mapping for per-repo URL resolution. Override default purple link color with blue (#60a5fa) to stay readable on the dark background. * main * Add Future Engagements section with notes panels to exec view Prominent banner heading, four numbered cards (CI/CD, Security, Runtime, Product Integration) each with a right-hand Notes panel for discussion points. Refactored _next_card helper to accept optional notes parameter.	2026-04-15 13:11:28 -05:00
Kevin Turcios	20f6c59f05	Lint and format entire repo, not just packages (#23 ) Remove .codeflash/ from ruff extend-exclude, add per-file ignores for .codeflash/, scripts/, evals/, and plugin/ (benchmark/script patterns like print, eval, magic values). Remove shebangs. Widen pre-commit hooks to check the full repo.	2026-04-15 03:16:15 -05:00
Kevin Turcios	33faedf427	Add Unstructured report, rewrite statusline, format evals/scripts (#20 ) * Add Unstructured engagement report as uv workspace member Three-tier Plotly Dash app (Executive Brief, Engineering Team, Full Detail) with data in JSON, theme constants in theme.py, and Dash production improvements (Google Fonts, clientside callbacks, meta tags). Also: add .playwright-mcp/ to .gitignore, add reports/* ruff overrides, remove tracked .codeflash/observability/read-tracker. * Rewrite statusline to derive context from git state Detects active area from changed files (reports, packages, plugin, .codeflash, case-studies, evals), falls back to branch name convention (perf/, feat/, fix/), shows dirty indicator. Uses whoami for cross-platform user detection. Add pre-push lint rule to commit guidelines * Exclude .codeflash/ from ruff linting Benchmark and profiling scripts in .codeflash/ are scratch work, not package source. Excluding them prevents CI failures from ad-hoc scripts. * Run ruff format across packages, scripts, evals, and plugin refs * Fix github-app async test failures in CI Add asyncio_mode = "auto" to root pytest config so async tests are detected when running from the repo root via uv run pytest packages/.	2026-04-15 03:06:16 -05:00
Kevin Turcios	3b59d97647	squash	2026-04-13 14:12:17 -05:00
Kevin Turcios	ebb9658dfd	Merge main-teammate branch	2026-04-03 17:36:50 -05:00

19 commits