codeflash-agent

mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

Author	SHA1	Message	Date
Kevin Turcios	eaa83207a0	Fix blackbox dev server command and add repo-root instructions (#48 ) fastapi[standard] extras aren't resolved at workspace level, so `fastapi dev` fails without `--package blackbox`. Add working commands for both repo root (uvicorn) and package directory (fastapi dev).	2026-04-29 04:18:39 -05:00
Kevin Turcios	6ddc97a575	perf(models): convert prefix/skill/command tuples to frozensets (#47 ) * perf(models): convert prefix/skill/command tuples to frozensets Use frozenset for CODEFLASH_AGENT_PREFIXES, CODEFLASH_SKILLS, and CODEFLASH_COMMANDS for O(1) membership testing. * style: format frozenset literals per ruff * Add blackbox benchmark VM infra D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning, CPU-pinned benchmarks, and A/B comparison scripts. --------- Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>	2026-04-29 03:22:50 -05:00
Kevin Turcios	e4fcbb5b83	perf(rendering): cache entry.level in render_log_html (#46 ) * perf(rendering): cache entry.level and is_thinking in render_log_html Avoid repeated attribute access and redundant strip() calls by caching level and is_thinking as local variables. * fix: remove stale noqa and use set membership test The refactoring reduced complexity below C901/PLR0912 thresholds, making the noqa directive unused. Use `in` set test per PLR1714. * Add blackbox benchmark VM infra D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning, CPU-pinned benchmarks, and A/B comparison scripts. --------- Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>	2026-04-29 03:22:47 -05:00
Kevin Turcios	41edcf06e1	perf(transcript): cache fromisoformat, single-pass parsing (#45 ) * perf(transcript): cache fromisoformat, local json.loads, single-pass parsing Move datetime import to module level with cached fromisoformat, use bytes split for JSONL, inline tool_result iteration in parse_user_entry, promote decode_project_name filter set to module-level frozenset. * fix: use splitlines for JSONL parsing split("\n") leaves \r on lines from Windows-originated JSONL files, which can cause json.loads failures. splitlines() handles all line ending variants. * fix: add noqa C901 for inlined parse_user_entry The tool_result iteration was inlined for single-pass performance, which pushes complexity above the C901 threshold. * Add blackbox benchmark VM infra D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning, CPU-pinned benchmarks, and A/B comparison scripts. --------- Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>	2026-04-29 03:22:44 -05:00
Kevin Turcios	1ff2a76152	perf(analytics): use rfind and local json.loads (#44 ) * perf(analytics): use rfind and local json.loads for hot paths Replace Path().suffix with string rfind for extension extraction, use local json.loads binding and bytes split for JSONL parsing. * fix: use splitlines and preserve extensionless file behavior split("\n") mishandles \r\n line endings. The early return on extensionless files changed behavior vs the original Path().suffix which returned "" and fell through. Use splitlines() and let extensionless files fall through with lang=None. * style: use ternary for extensionless file check per SIM108 * Add blackbox benchmark VM infra D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning, CPU-pinned benchmarks, and A/B comparison scripts. --------- Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>	2026-04-29 03:22:42 -05:00
Kevin Turcios	0ad5e60523	Add blackbox package: session flight recorder with HTMX dashboard (#39 ) * feat(blackbox): add package with models, CLI, and HTMX dashboard * test(blackbox): add comprehensive test coverage for dashboard * feat(blackbox): cache session scanning via watcher invalidation * docs(blackbox): add README and use fastapi[standard] for dev server * refactor(blackbox): extract presentation logic into formatter classes * refactor(blackbox): extract classify_error helpers * feat(blackbox): wire analytics into session detail view Show token usage, tool breakdowns, and session stats in a collapsible panel when viewing a session. * feat(blackbox): add codeflash plugin detection Detect codeflash agent names, skills, and commands in transcripts. Surface language, optimization domain, and capability badges in the analytics panel. * refactor(blackbox): remove underscore prefixes from internal functions * chore: add ty python-version to root pyproject.toml * chore(blackbox): fix lint errors in test files * style(blackbox): apply ruff formatting to analytics * feat(blackbox): add Playwright E2E tests for dashboard Refactor app.py to expose create_app() factory accepting a projects_dir override, enabling tests to run against fixture data instead of the real ~/.claude/projects/ directory. Routes now read projects_dir from app.state instead of the module-level constant. Add 26 Playwright tests across 5 files covering dashboard loading, session list, session detail with filters and analytics, sidebar collapse/localStorage persistence, and SSE log streaming. All tests pass on chromium, firefox, and webkit (78 total). CI gets a new e2e-blackbox job with a browser matrix strategy running all three engines in parallel, conditional on blackbox path changes, with trace upload on failure. * fix(ci): sync only blackbox package in e2e job * fix(ci): exclude e2e tests from unit test job The test job doesn't install Playwright browsers, so e2e tests error when pytest collects them. Ignore tests/e2e/ directories in the test job — those are handled by the dedicated e2e-blackbox job.	2026-04-28 19:58:43 -05:00

6 commits