* feat(blackbox): add package with models, CLI, and HTMX dashboard * test(blackbox): add comprehensive test coverage for dashboard * feat(blackbox): cache session scanning via watcher invalidation * docs(blackbox): add README and use fastapi[standard] for dev server * refactor(blackbox): extract presentation logic into formatter classes * refactor(blackbox): extract classify_error helpers * feat(blackbox): wire analytics into session detail view Show token usage, tool breakdowns, and session stats in a collapsible panel when viewing a session. * feat(blackbox): add codeflash plugin detection Detect codeflash agent names, skills, and commands in transcripts. Surface language, optimization domain, and capability badges in the analytics panel. * refactor(blackbox): remove underscore prefixes from internal functions * chore: add ty python-version to root pyproject.toml * chore(blackbox): fix lint errors in test files * style(blackbox): apply ruff formatting to analytics * feat(blackbox): add Playwright E2E tests for dashboard Refactor app.py to expose create_app() factory accepting a projects_dir override, enabling tests to run against fixture data instead of the real ~/.claude/projects/ directory. Routes now read projects_dir from app.state instead of the module-level constant. Add 26 Playwright tests across 5 files covering dashboard loading, session list, session detail with filters and analytics, sidebar collapse/localStorage persistence, and SSE log streaming. All tests pass on chromium, firefox, and webkit (78 total). CI gets a new e2e-blackbox job with a browser matrix strategy running all three engines in parallel, conditional on blackbox path changes, with trace upload on failure. * fix(ci): sync only blackbox package in e2e job * fix(ci): exclude e2e tests from unit test job The test job doesn't install Playwright browsers, so e2e tests error when pytest collects them. Ignore tests/e2e/ directories in the test job — those are handled by the dedicated e2e-blackbox job.
2 KiB
blackbox
A flight data recorder for AI coding agent sessions.
Why "blackbox"?
Aircraft carry black boxes (flight data recorders) that silently capture everything during a flight, then become invaluable when you need to understand what happened. This package does the same for AI coding agent sessions: it watches, records, and lets you replay what the agent did, how it spent tokens, where it got stuck, and whether the session achieved its goal.
Currently supports Claude Code. Codex and Gemini support is planned.
What it does
Dashboard -- a local HTMX web UI for browsing session transcripts in real time.
- Sidebar with all sessions from
~/.claude/projects/, sorted by recency - Live session detection via filesystem watching (green dot indicator)
- Streaming log view with filter presets (all, compact, important, errors)
- Tool call previews, error highlighting, user message formatting
Analytics models -- structured data types for session-level metrics, weekly trends, project breakdowns, and recommendations. These feed into the analysis pipeline (in progress) that will produce session digests and surface patterns across sessions.
Usage
blackbox serve # open dashboard at http://localhost:7100
blackbox serve --port 8080 # custom port
blackbox serve --no-open # don't auto-open browser
Package structure
src/blackbox/
cli.py # CLI entry point (serve command)
models.py # All domain models (attrs frozen classes)
dashboard/
app.py # FastAPI instance + lifespan
routes.py # API endpoints + SSE log streaming
rendering.py # HTML rendering, filtering, formatting
transcript.py # JSONL transcript parser + session scanner
watcher.py # Watchdog-based live session detection + cache
templates/ # Jinja2 templates (Tailwind + HTMX)
Development
uv sync
uv run fastapi dev src/blackbox/dashboard/app.py # hot reload on :8000
uv run pytest tests/ -v
uv run ruff check src/ tests/