fastapi[standard] extras aren't resolved at workspace level, so
`fastapi dev` fails without `--package blackbox`. Add working
commands for both repo root (uvicorn) and package directory (fastapi dev).
* perf(models): convert prefix/skill/command tuples to frozensets
Use frozenset for CODEFLASH_AGENT_PREFIXES, CODEFLASH_SKILLS,
and CODEFLASH_COMMANDS for O(1) membership testing.
* style: format frozenset literals per ruff
* Add blackbox benchmark VM infra
D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning,
CPU-pinned benchmarks, and A/B comparison scripts.
---------
Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>
* perf(rendering): cache entry.level and is_thinking in render_log_html
Avoid repeated attribute access and redundant strip() calls
by caching level and is_thinking as local variables.
* fix: remove stale noqa and use set membership test
The refactoring reduced complexity below C901/PLR0912 thresholds,
making the noqa directive unused. Use `in` set test per PLR1714.
* Add blackbox benchmark VM infra
D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning,
CPU-pinned benchmarks, and A/B comparison scripts.
---------
Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>
* perf(transcript): cache fromisoformat, local json.loads, single-pass parsing
Move datetime import to module level with cached fromisoformat,
use bytes split for JSONL, inline tool_result iteration in parse_user_entry,
promote decode_project_name filter set to module-level frozenset.
* fix: use splitlines for JSONL parsing
split("\n") leaves \r on lines from Windows-originated JSONL files,
which can cause json.loads failures. splitlines() handles all line
ending variants.
* fix: add noqa C901 for inlined parse_user_entry
The tool_result iteration was inlined for single-pass performance,
which pushes complexity above the C901 threshold.
* Add blackbox benchmark VM infra
D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning,
CPU-pinned benchmarks, and A/B comparison scripts.
---------
Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>
* perf(analytics): use rfind and local json.loads for hot paths
Replace Path().suffix with string rfind for extension extraction,
use local json.loads binding and bytes split for JSONL parsing.
* fix: use splitlines and preserve extensionless file behavior
split("\n") mishandles \r\n line endings. The early return on
extensionless files changed behavior vs the original Path().suffix
which returned "" and fell through. Use splitlines() and let
extensionless files fall through with lang=None.
* style: use ternary for extensionless file check per SIM108
* Add blackbox benchmark VM infra
D2s_v5 (non-burstable, 2 vCPU, 8 GB) with cloud-init provisioning,
CPU-pinned benchmarks, and A/B comparison scripts.
---------
Co-authored-by: codeflash[bot] <codeflash[bot]@users.noreply.github.com>
* feat(blackbox): add package with models, CLI, and HTMX dashboard
* test(blackbox): add comprehensive test coverage for dashboard
* feat(blackbox): cache session scanning via watcher invalidation
* docs(blackbox): add README and use fastapi[standard] for dev server
* refactor(blackbox): extract presentation logic into formatter classes
* refactor(blackbox): extract classify_error helpers
* feat(blackbox): wire analytics into session detail view
Show token usage, tool breakdowns, and session stats in a
collapsible panel when viewing a session.
* feat(blackbox): add codeflash plugin detection
Detect codeflash agent names, skills, and commands in transcripts.
Surface language, optimization domain, and capability badges in
the analytics panel.
* refactor(blackbox): remove underscore prefixes from internal functions
* chore: add ty python-version to root pyproject.toml
* chore(blackbox): fix lint errors in test files
* style(blackbox): apply ruff formatting to analytics
* feat(blackbox): add Playwright E2E tests for dashboard
Refactor app.py to expose create_app() factory accepting a projects_dir
override, enabling tests to run against fixture data instead of the real
~/.claude/projects/ directory. Routes now read projects_dir from
app.state instead of the module-level constant.
Add 26 Playwright tests across 5 files covering dashboard loading,
session list, session detail with filters and analytics, sidebar
collapse/localStorage persistence, and SSE log streaming. All tests
pass on chromium, firefox, and webkit (78 total).
CI gets a new e2e-blackbox job with a browser matrix strategy running
all three engines in parallel, conditional on blackbox path changes,
with trace upload on failure.
* fix(ci): sync only blackbox package in e2e job
* fix(ci): exclude e2e tests from unit test job
The test job doesn't install Playwright browsers, so e2e tests error
when pytest collects them. Ignore tests/e2e/ directories in the test
job — those are handled by the dedicated e2e-blackbox job.