Commit graph

6 commits

Author SHA1 Message Date
Kevin Turcios
eaa83207a0
Fix blackbox dev server command and add repo-root instructions (#48)
fastapi[standard] extras aren't resolved at workspace level, so
`fastapi dev` fails without `--package blackbox`. Add working
commands for both repo root (uvicorn) and package directory (fastapi dev).
2026-04-29 04:18:39 -05:00
Kevin Turcios
6ddc97a575
perf(models): convert prefix/skill/command tuples to frozensets (#47)
* perf(models): convert prefix/skill/command tuples to frozensets

Use frozenset for CODEFLASH_AGENT_PREFIXES, CODEFLASH_SKILLS,
and CODEFLASH_COMMANDS for O(1) membership testing.

* style: format frozenset literals per ruff

* Add blackbox benchmark VM infra

D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning,
CPU-pinned benchmarks, and A/B comparison scripts.

---------

Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>
2026-04-29 03:22:50 -05:00
Kevin Turcios
e4fcbb5b83
perf(rendering): cache entry.level in render_log_html (#46)
* perf(rendering): cache entry.level and is_thinking in render_log_html

Avoid repeated attribute access and redundant strip() calls
by caching level and is_thinking as local variables.

* fix: remove stale noqa and use set membership test

The refactoring reduced complexity below C901/PLR0912 thresholds,
making the noqa directive unused. Use `in` set test per PLR1714.

* Add blackbox benchmark VM infra

D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning,
CPU-pinned benchmarks, and A/B comparison scripts.

---------

Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>
2026-04-29 03:22:47 -05:00
Kevin Turcios
41edcf06e1
perf(transcript): cache fromisoformat, single-pass parsing (#45)
* perf(transcript): cache fromisoformat, local json.loads, single-pass parsing

Move datetime import to module level with cached fromisoformat,
use bytes split for JSONL, inline tool_result iteration in parse_user_entry,
promote decode_project_name filter set to module-level frozenset.

* fix: use splitlines for JSONL parsing

split("\n") leaves \r on lines from Windows-originated JSONL files,
which can cause json.loads failures. splitlines() handles all line
ending variants.

* fix: add noqa C901 for inlined parse_user_entry

The tool_result iteration was inlined for single-pass performance,
which pushes complexity above the C901 threshold.

* Add blackbox benchmark VM infra

D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning,
CPU-pinned benchmarks, and A/B comparison scripts.

---------

Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>
2026-04-29 03:22:44 -05:00
Kevin Turcios
1ff2a76152
perf(analytics): use rfind and local json.loads (#44)
* perf(analytics): use rfind and local json.loads for hot paths

Replace Path().suffix with string rfind for extension extraction,
use local json.loads binding and bytes split for JSONL parsing.

* fix: use splitlines and preserve extensionless file behavior

split("\n") mishandles \r\n line endings. The early return on
extensionless files changed behavior vs the original Path().suffix
which returned "" and fell through. Use splitlines() and let
extensionless files fall through with lang=None.

* style: use ternary for extensionless file check per SIM108

* Add blackbox benchmark VM infra

D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning,
CPU-pinned benchmarks, and A/B comparison scripts.

---------

Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>
2026-04-29 03:22:42 -05:00
Kevin Turcios
0ad5e60523
Add blackbox package: session flight recorder with HTMX dashboard (#39)
* feat(blackbox): add package with models, CLI, and HTMX dashboard

* test(blackbox): add comprehensive test coverage for dashboard

* feat(blackbox): cache session scanning via watcher invalidation

* docs(blackbox): add README and use fastapi[standard] for dev server

* refactor(blackbox): extract presentation logic into formatter classes

* refactor(blackbox): extract classify_error helpers

* feat(blackbox): wire analytics into session detail view

Show token usage, tool breakdowns, and session stats in a
collapsible panel when viewing a session.

* feat(blackbox): add codeflash plugin detection

Detect codeflash agent names, skills, and commands in transcripts.
Surface language, optimization domain, and capability badges in
the analytics panel.

* refactor(blackbox): remove underscore prefixes from internal functions

* chore: add ty python-version to root pyproject.toml

* chore(blackbox): fix lint errors in test files

* style(blackbox): apply ruff formatting to analytics

* feat(blackbox): add Playwright E2E tests for dashboard

Refactor app.py to expose create_app() factory accepting a projects_dir
override, enabling tests to run against fixture data instead of the real
~/.claude/projects/ directory. Routes now read projects_dir from
app.state instead of the module-level constant.

Add 26 Playwright tests across 5 files covering dashboard loading,
session list, session detail with filters and analytics, sidebar
collapse/localStorage persistence, and SSE log streaming. All tests
pass on chromium, firefox, and webkit (78 total).

CI gets a new e2e-blackbox job with a browser matrix strategy running
all three engines in parallel, conditional on blackbox path changes,
with trace upload on failure.

* fix(ci): sync only blackbox package in e2e job

* fix(ci): exclude e2e tests from unit test job

The test job doesn't install Playwright browsers, so e2e tests error
when pytest collects them. Ignore tests/e2e/ directories in the test
job — those are handled by the dedicated e2e-blackbox job.
2026-04-28 19:58:43 -05:00