mirror of https://github.com/codeflash-ai/codeflash-agent.git synced 2026-05-04 18:25:19 +00:00

Add blackbox package: session flight recorder with HTMX dashboard (#39 )

* feat(blackbox): add package with models, CLI, and HTMX dashboard

* test(blackbox): add comprehensive test coverage for dashboard

* feat(blackbox): cache session scanning via watcher invalidation

* docs(blackbox): add README and use fastapi[standard] for dev server

* refactor(blackbox): extract presentation logic into formatter classes

* refactor(blackbox): extract classify_error helpers

* feat(blackbox): wire analytics into session detail view

Show token usage, tool breakdowns, and session stats in a
collapsible panel when viewing a session.

* feat(blackbox): add codeflash plugin detection

Detect codeflash agent names, skills, and commands in transcripts.
Surface language, optimization domain, and capability badges in
the analytics panel.

* refactor(blackbox): remove underscore prefixes from internal functions

* chore: add ty python-version to root pyproject.toml

* chore(blackbox): fix lint errors in test files

* style(blackbox): apply ruff formatting to analytics

* feat(blackbox): add Playwright E2E tests for dashboard

Refactor app.py to expose create_app() factory accepting a projects_dir
override, enabling tests to run against fixture data instead of the real
~/.claude/projects/ directory. Routes now read projects_dir from
app.state instead of the module-level constant.

Add 26 Playwright tests across 5 files covering dashboard loading,
session list, session detail with filters and analytics, sidebar
collapse/localStorage persistence, and SSE log streaming. All tests
pass on chromium, firefox, and webkit (78 total).

CI gets a new e2e-blackbox job with a browser matrix strategy running
all three engines in parallel, conditional on blackbox path changes,
with trace upload on failure.

* fix(ci): sync only blackbox package in e2e job

* fix(ci): exclude e2e tests from unit test job

The test job doesn't install Playwright browsers, so e2e tests error
when pytest collects them. Ignore tests/e2e/ directories in the test
job — those are handled by the dedicated e2e-blackbox job.

2026-04-28 19:58:43 -05:00

2 KiB

Raw Blame History

blackbox

A flight data recorder for AI coding agent sessions.

Why "blackbox"?

Aircraft carry black boxes (flight data recorders) that silently capture everything during a flight, then become invaluable when you need to understand what happened. This package does the same for AI coding agent sessions: it watches, records, and lets you replay what the agent did, how it spent tokens, where it got stuck, and whether the session achieved its goal.

Currently supports Claude Code. Codex and Gemini support is planned.

What it does

Dashboard -- a local HTMX web UI for browsing session transcripts in real time.

Sidebar with all sessions from ~/.claude/projects/, sorted by recency
Live session detection via filesystem watching (green dot indicator)
Streaming log view with filter presets (all, compact, important, errors)
Tool call previews, error highlighting, user message formatting

Analytics models -- structured data types for session-level metrics, weekly trends, project breakdowns, and recommendations. These feed into the analysis pipeline (in progress) that will produce session digests and surface patterns across sessions.

Usage

blackbox serve              # open dashboard at http://localhost:7100
blackbox serve --port 8080  # custom port
blackbox serve --no-open    # don't auto-open browser

Package structure

src/blackbox/
  cli.py              # CLI entry point (serve command)
  models.py           # All domain models (attrs frozen classes)
  dashboard/
    app.py            # FastAPI instance + lifespan
    routes.py         # API endpoints + SSE log streaming
    rendering.py      # HTML rendering, filtering, formatting
    transcript.py     # JSONL transcript parser + session scanner
    watcher.py        # Watchdog-based live session detection + cache
    templates/        # Jinja2 templates (Tailwind + HTMX)

Development

uv sync
uv run fastapi dev src/blackbox/dashboard/app.py  # hot reload on :8000
uv run pytest tests/ -v
uv run ruff check src/ tests/

2 KiB Raw Blame History