Add blackbox package: session flight recorder with HTMX dashboard (#39)
* feat(blackbox): add package with models, CLI, and HTMX dashboard
* test(blackbox): add comprehensive test coverage for dashboard
* feat(blackbox): cache session scanning via watcher invalidation
* docs(blackbox): add README and use fastapi[standard] for dev server
* refactor(blackbox): extract presentation logic into formatter classes
* refactor(blackbox): extract classify_error helpers
* feat(blackbox): wire analytics into session detail view
Show token usage, tool breakdowns, and session stats in a
collapsible panel when viewing a session.
* feat(blackbox): add codeflash plugin detection
Detect codeflash agent names, skills, and commands in transcripts.
Surface language, optimization domain, and capability badges in
the analytics panel.
* refactor(blackbox): remove underscore prefixes from internal functions
* chore: add ty python-version to root pyproject.toml
* chore(blackbox): fix lint errors in test files
* style(blackbox): apply ruff formatting to analytics
* feat(blackbox): add Playwright E2E tests for dashboard
Refactor app.py to expose create_app() factory accepting a projects_dir
override, enabling tests to run against fixture data instead of the real
~/.claude/projects/ directory. Routes now read projects_dir from
app.state instead of the module-level constant.
Add 26 Playwright tests across 5 files covering dashboard loading,
session list, session detail with filters and analytics, sidebar
collapse/localStorage persistence, and SSE log streaming. All tests
pass on chromium, firefox, and webkit (78 total).
CI gets a new e2e-blackbox job with a browser matrix strategy running
all three engines in parallel, conditional on blackbox path changes,
with trace upload on failure.
* fix(ci): sync only blackbox package in e2e job
* fix(ci): exclude e2e tests from unit test job
The test job doesn't install Playwright browsers, so e2e tests error
when pytest collects them. Ignore tests/e2e/ directories in the test
job — those are handled by the dedicated e2e-blackbox job.
2026-04-29 00:58:43 +00:00
|
|
|
# blackbox
|
|
|
|
|
|
|
|
|
|
A flight data recorder for AI coding agent sessions.
|
|
|
|
|
|
|
|
|
|
## Why "blackbox"?
|
|
|
|
|
|
|
|
|
|
Aircraft carry black boxes (flight data recorders) that silently capture
|
|
|
|
|
everything during a flight, then become invaluable when you need to
|
|
|
|
|
understand what happened. This package does the same for AI coding agent
|
|
|
|
|
sessions: it watches, records, and lets you replay what the agent did,
|
|
|
|
|
how it spent tokens, where it got stuck, and whether the session achieved
|
|
|
|
|
its goal.
|
|
|
|
|
|
|
|
|
|
Currently supports Claude Code. Codex and Gemini support is planned.
|
|
|
|
|
|
|
|
|
|
## What it does
|
|
|
|
|
|
|
|
|
|
**Dashboard** -- a local HTMX web UI for browsing session transcripts
|
|
|
|
|
in real time.
|
|
|
|
|
|
|
|
|
|
- Sidebar with all sessions from `~/.claude/projects/`, sorted by recency
|
|
|
|
|
- Live session detection via filesystem watching (green dot indicator)
|
|
|
|
|
- Streaming log view with filter presets (all, compact, important, errors)
|
|
|
|
|
- Tool call previews, error highlighting, user message formatting
|
|
|
|
|
|
|
|
|
|
**Analytics models** -- structured data types for session-level metrics,
|
|
|
|
|
weekly trends, project breakdowns, and recommendations. These feed into
|
|
|
|
|
the analysis pipeline (in progress) that will produce session digests
|
|
|
|
|
and surface patterns across sessions.
|
|
|
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
blackbox serve # open dashboard at http://localhost:7100
|
|
|
|
|
blackbox serve --port 8080 # custom port
|
|
|
|
|
blackbox serve --no-open # don't auto-open browser
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Package structure
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
src/blackbox/
|
|
|
|
|
cli.py # CLI entry point (serve command)
|
|
|
|
|
models.py # All domain models (attrs frozen classes)
|
|
|
|
|
dashboard/
|
|
|
|
|
app.py # FastAPI instance + lifespan
|
|
|
|
|
routes.py # API endpoints + SSE log streaming
|
|
|
|
|
rendering.py # HTML rendering, filtering, formatting
|
|
|
|
|
transcript.py # JSONL transcript parser + session scanner
|
|
|
|
|
watcher.py # Watchdog-based live session detection + cache
|
|
|
|
|
templates/ # Jinja2 templates (Tailwind + HTMX)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Development
|
|
|
|
|
|
2026-04-29 09:18:39 +00:00
|
|
|
From the repo root:
|
|
|
|
|
|
Add blackbox package: session flight recorder with HTMX dashboard (#39)
* feat(blackbox): add package with models, CLI, and HTMX dashboard
* test(blackbox): add comprehensive test coverage for dashboard
* feat(blackbox): cache session scanning via watcher invalidation
* docs(blackbox): add README and use fastapi[standard] for dev server
* refactor(blackbox): extract presentation logic into formatter classes
* refactor(blackbox): extract classify_error helpers
* feat(blackbox): wire analytics into session detail view
Show token usage, tool breakdowns, and session stats in a
collapsible panel when viewing a session.
* feat(blackbox): add codeflash plugin detection
Detect codeflash agent names, skills, and commands in transcripts.
Surface language, optimization domain, and capability badges in
the analytics panel.
* refactor(blackbox): remove underscore prefixes from internal functions
* chore: add ty python-version to root pyproject.toml
* chore(blackbox): fix lint errors in test files
* style(blackbox): apply ruff formatting to analytics
* feat(blackbox): add Playwright E2E tests for dashboard
Refactor app.py to expose create_app() factory accepting a projects_dir
override, enabling tests to run against fixture data instead of the real
~/.claude/projects/ directory. Routes now read projects_dir from
app.state instead of the module-level constant.
Add 26 Playwright tests across 5 files covering dashboard loading,
session list, session detail with filters and analytics, sidebar
collapse/localStorage persistence, and SSE log streaming. All tests
pass on chromium, firefox, and webkit (78 total).
CI gets a new e2e-blackbox job with a browser matrix strategy running
all three engines in parallel, conditional on blackbox path changes,
with trace upload on failure.
* fix(ci): sync only blackbox package in e2e job
* fix(ci): exclude e2e tests from unit test job
The test job doesn't install Playwright browsers, so e2e tests error
when pytest collects them. Ignore tests/e2e/ directories in the test
job — those are handled by the dedicated e2e-blackbox job.
2026-04-29 00:58:43 +00:00
|
|
|
```bash
|
|
|
|
|
uv sync
|
2026-04-29 09:18:39 +00:00
|
|
|
uv run --package blackbox uvicorn blackbox.dashboard.app:app --reload # hot reload on :8000
|
|
|
|
|
uv run pytest packages/blackbox/tests/ -v
|
|
|
|
|
uv run ruff check packages/blackbox/
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
From `packages/blackbox/`:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
uv run --package blackbox fastapi dev src/blackbox/dashboard/app.py # hot reload on :8000
|
Add blackbox package: session flight recorder with HTMX dashboard (#39)
* feat(blackbox): add package with models, CLI, and HTMX dashboard
* test(blackbox): add comprehensive test coverage for dashboard
* feat(blackbox): cache session scanning via watcher invalidation
* docs(blackbox): add README and use fastapi[standard] for dev server
* refactor(blackbox): extract presentation logic into formatter classes
* refactor(blackbox): extract classify_error helpers
* feat(blackbox): wire analytics into session detail view
Show token usage, tool breakdowns, and session stats in a
collapsible panel when viewing a session.
* feat(blackbox): add codeflash plugin detection
Detect codeflash agent names, skills, and commands in transcripts.
Surface language, optimization domain, and capability badges in
the analytics panel.
* refactor(blackbox): remove underscore prefixes from internal functions
* chore: add ty python-version to root pyproject.toml
* chore(blackbox): fix lint errors in test files
* style(blackbox): apply ruff formatting to analytics
* feat(blackbox): add Playwright E2E tests for dashboard
Refactor app.py to expose create_app() factory accepting a projects_dir
override, enabling tests to run against fixture data instead of the real
~/.claude/projects/ directory. Routes now read projects_dir from
app.state instead of the module-level constant.
Add 26 Playwright tests across 5 files covering dashboard loading,
session list, session detail with filters and analytics, sidebar
collapse/localStorage persistence, and SSE log streaming. All tests
pass on chromium, firefox, and webkit (78 total).
CI gets a new e2e-blackbox job with a browser matrix strategy running
all three engines in parallel, conditional on blackbox path changes,
with trace upload on failure.
* fix(ci): sync only blackbox package in e2e job
* fix(ci): exclude e2e tests from unit test job
The test job doesn't install Playwright browsers, so e2e tests error
when pytest collects them. Ignore tests/e2e/ directories in the test
job — those are handled by the dedicated e2e-blackbox job.
2026-04-29 00:58:43 +00:00
|
|
|
uv run pytest tests/ -v
|
|
|
|
|
uv run ruff check src/ tests/
|
|
|
|
|
```
|