Add codeflash-api architecture docs and project-scoped rules

CLAUDE.md with full package structure, layer boundaries, endpoint map, implementation order, business logic audit, and design decisions. Rules: architecture (layer boundaries, model conventions), testing (coverage requirements, mocking strategy), porting (reference files, what to port vs skip).
2026-04-21 21:16:32 -05:00 · 2026-04-21 21:16:32 -05:00 · e34873fb82
commit e34873fb82
parent 2221de0a71
4 changed files with 375 additions and 0 deletions
--- a/packages/codeflash-api/.claude/rules/architecture.md
+++ b/packages/codeflash-api/.claude/rules/architecture.md
@ -0,0 +1,30 @@
+# Architecture Rules
+
+## Layer boundaries
+
+Four layers, strict downward dependency:
+
+1. **HTTP** (routers, schemas) — Pydantic models, FastAPI deps
+2. **Pipeline** (context, pipeline) — attrs domain models, orchestration
+3. **Language** (optimizer, postprocess, validator) — libcst, tree-sitter
+4. **Infrastructure** (llm, db, observability) — asyncpg, openai, anthropic
+
+Never import upward. Routers never import from each other. Language modules never import from routers.
+
+## Models
+
+- **Pydantic** for request/response schemas only (files named `schemas.py`)
+- **attrs** for everything else — frozen by default, define when mutation needed
+- Never mix: a Pydantic model should not contain attrs classes or vice versa. Convert at the boundary.
+
+## Dependencies
+
+Use FastAPI `Depends()` for cross-cutting concerns (auth, rate limiting, usage tracking). Never use ASGI middleware for request-specific logic — middleware is for truly global concerns (CORS, healthcheck short-circuit).
+
+## Async
+
+Everything is async. No sync database calls, no sync LLM calls, no `run_in_executor` unless wrapping a library that has no async API (like libcst parsing).
+
+## No ORM
+
+Raw SQL via asyncpg. Queries live in `db/_queries.py`. Table schemas documented as attrs classes in `db/models.py` but never used as ORM models.
--- a/packages/codeflash-api/.claude/rules/porting.md
+++ b/packages/codeflash-api/.claude/rules/porting.md
@ -0,0 +1,45 @@
+# Porting Rules
+
+## Source of truth
+
+The Django aiservice at `~/Desktop/work/cf_org/codeflash-internal/django/aiservice/` is the reference implementation. When porting:
+
+1. Read the Django source first
+2. Copy the logic, adapt the framework glue
+3. Don't "improve" the logic during the port — that introduces subtle divergence
+4. Improvements come in follow-up commits after the port passes tests
+
+## Key files to port from (by priority)
+
+### P0 — Core pipeline
+- `aiservice/llm.py` → `llm/_client.py` (dual-provider abstraction, cost tracking)
+- `core/languages/python/optimizer/optimizer.py` → `optimize/_pipeline.py`
+- `core/languages/python/optimizer/context_utils/optimizer_context.py` → `optimize/_context.py`
+- `core/languages/python/optimizer/postprocess.py` → `languages/python/_postprocess.py`
+- `core/languages/python/cst_utils.py` → `languages/python/_cst_utils.py`
+- `core/languages/python/optimizer/diff_patches_utils/` → `diff/`
+- `core/languages/python/code_repair/` → `repair/`
+
+### P1 — Supporting endpoints
+- `core/languages/python/optimizer/refinement.py` → `refinement/`
+- `core/languages/python/adaptive_optimizer/` → `adaptive/`
+- `core/shared/ranker/ranker.py` → `ranking/`
+- `core/shared/testgen_router.py` → `testgen/`
+- `core/languages/python/explanations/` → `explain/`
+- `core/languages/python/optimization_review/` → `review/`
+
+### P2 — Secondary languages
+- `core/languages/java/` → `languages/java/`
+- `core/languages/js_ts/` → `languages/js_ts/`
+
+## What NOT to port
+
+- Django ORM models → replaced by asyncpg queries
+- Django middleware classes → replaced by FastAPI dependencies
+- `manage.py`, `wsgi.py`, `asgi.py` → replaced by FastAPI app factory
+- `settings.py` → replaced by pydantic-settings `_config.py`
+- Django `urls.py` / NinjaAPI router mounting → replaced by FastAPI router includes
+
+## Prompt templates
+
+Copy `.md` prompt files verbatim. They are not Django-specific. Place them in `languages/<lang>/prompts/` mirroring the Django structure.
--- a/packages/codeflash-api/.claude/rules/testing.md
+++ b/packages/codeflash-api/.claude/rules/testing.md
@ -0,0 +1,48 @@
+# Testing Rules
+
+## Coverage requirement
+
+Every module must have a corresponding test file. No exceptions. The Django version broke in E2E because key modules had zero tests — we don't repeat that mistake.
+
+## What to test
+
+For each module, test:
+1. **Happy path** — the normal case works
+2. **Error paths** — malformed input, missing fields, invalid values
+3. **Edge cases** — empty strings, None values, boundary conditions
+4. **Provider differences** — OpenAI vs Anthropic response formats differ
+
+## LLM mocking
+
+Never call real LLM APIs in tests. Use `unittest.mock.AsyncMock` or pytest fixtures that return canned responses. Test the parsing and orchestration, not the LLM.
+
+## Database mocking
+
+Unit tests mock the DB layer. Integration tests use a real test database (or testcontainers). Keep these separate — unit tests must be fast (<5 seconds total).
+
+## Diff/patch tests
+
+The diff application code (V4A, search/replace) is pure logic with no I/O. Test it exhaustively:
+- Malformed blocks, partial matches, empty input
+- Multi-hunk patches, fuzzy matching at all fuzz levels
+- EOF handling, context line matching
+- File path extraction from markers
+
+## Validation tests
+
+Test every language validator with:
+- Valid code (parses successfully)
+- Syntax errors (returns failure, doesn't crash)
+- Empty input, None input
+- Unicode, unusual whitespace
+
+## FastAPI test client
+
+Use `httpx.AsyncClient` with `app=app` for router tests. Don't start a real server.
+
+```python
+@pytest.fixture
+async def client(app):
+    async with httpx.AsyncClient(transport=httpx.ASGITransport(app=app), base_url="http://test") as c:
+        yield c
+```
--- a/packages/codeflash-api/CLAUDE.md
+++ b/packages/codeflash-api/CLAUDE.md
@ -0,0 +1,252 @@
+# codeflash-api
+
+FastAPI AI service — ground-up rewrite of the Django aiservice from `codeflash-internal/django/aiservice/`.
+
+## Reference implementation
+
+The Django version at `~/Desktop/work/cf_org/codeflash-internal/django/aiservice/` is the reference. Port logic faithfully from it — don't reimplement from scratch. When in doubt, read the Django source.
+
+## Architecture
+
+### Package structure
+
+```
+src/codeflash_api/
+├── __init__.py
+├── __main__.py               # uvicorn entry: python -m codeflash_api
+├── _app.py                   # FastAPI app factory, lifespan, middleware wiring
+├── _config.py                # Settings via pydantic-settings (env vars)
+│
+├── auth/                     # Authentication & authorization
+│   ├── _keys.py              # API key hashing (SHA-384), lookup
+│   ├── _deps.py              # FastAPI dependencies: get_current_user, require_auth
+│   ├── _rate_limit.py        # Per-user per-endpoint rate limiting
+│   ├── _usage.py             # Subscription quota tracking
+│   └── models.py             # User, Organization, Subscription, APIKey (attrs)
+│
+├── llm/                      # LLM provider abstraction
+│   ├── _client.py            # Async LLM client (OpenAI + Anthropic behind one interface)
+│   ├── _models.py            # Model definitions, cost rates, provider config (attrs)
+│   ├── _cost.py              # Cost calculation (cache-aware, per-provider)
+│   └── _retry.py             # Transient error classification, retry policy
+│
+├── db/                       # Database layer
+│   ├── _engine.py            # asyncpg pool creation, lifespan management
+│   ├── _queries.py           # Raw SQL queries (no ORM)
+│   └── models.py             # Table schemas as attrs classes (not ORM models)
+│
+├── observability/            # LLM call recording, error tracking
+│   ├── _recording.py         # Fire-and-forget async recording
+│   └── models.py             # LLMCall, OptimizationError (attrs)
+│
+├── optimize/                 # Core optimization pipeline
+│   ├── _router.py            # POST /ai/optimize — language dispatch
+│   ├── _pipeline.py          # Parallel LLM orchestration, model distribution
+│   ├── _context.py           # Prompt assembly (system + user), runtime context
+│   ├── _line_profiler.py     # POST /ai/optimize-line-profiler
+│   └── schemas.py            # Request/response Pydantic models
+│
+├── repair/                   # Code repair pipeline
+│   ├── _router.py            # POST /ai/code_repair
+│   ├── _context.py           # Repair prompt assembly, test diff formatting
+│   └── schemas.py
+│
+├── refinement/               # Refinement pipeline
+│   ├── _router.py            # POST /ai/refinement
+│   ├── _context.py           # Refinement prompt assembly
+│   └── schemas.py
+│
+├── adaptive/                 # Adaptive optimization
+│   ├── _router.py            # POST /ai/adaptive_optimize
+│   └── schemas.py
+│
+├── testgen/                  # Test generation
+│   ├── _router.py            # POST /ai/testgen
+│   ├── _review_router.py     # POST /ai/testgen_review, /ai/testgen_repair
+│   └── schemas.py
+│
+├── ranking/                  # Candidate ranking
+│   ├── _router.py            # POST /ai/rank
+│   └── schemas.py
+│
+├── explain/                  # Explanation generation
+│   ├── _router.py            # POST /ai/explain
+│   └── schemas.py
+│
+├── review/                   # Optimization review
+│   ├── _router.py            # POST /ai/optimization_review
+│   └── schemas.py
+│
+├── jit/                      # JIT rewrite
+│   ├── _router.py            # POST /ai/rewrite_jit
+│   └── schemas.py
+│
+├── workflow/                 # Workflow generation
+│   ├── _router.py            # POST /ai/workflow-gen
+│   └── schemas.py
+│
+├── logging/                  # Feature logging
+│   ├── _router.py            # POST /ai/log_features
+│   └── schemas.py
+│
+├── languages/                # Language-specific logic
+│   ├── python/
+│   │   ├── _optimizer.py     # Python optimization handler
+│   │   ├── _postprocess.py   # Dedup, validation, cleanup (~520 lines)
+│   │   ├── _cst_utils.py     # libcst utilities (~450 lines)
+│   │   ├── _validator.py     # Python syntax validation (libcst + ast)
+│   │   └── prompts/          # .md prompt templates
+│   ├── java/
+│   │   ├── _optimizer.py
+│   │   ├── _validator.py     # tree-sitter-java
+│   │   └── prompts/
+│   └── js_ts/
+│       ├── _optimizer.py
+│       ├── _validator.py     # tree-sitter-javascript/typescript
+│       └── prompts/
+│
+└── diff/                     # Diff patch application
+    ├── _base.py              # Diff ABC, DiffMethod enum
+    ├── _search_replace.py    # SEARCH/REPLACE block parser (~194 lines)
+    └── _v4a.py               # V4A unified diff with fuzzy matching (~380 lines)
+```
+
+### Layer boundaries
+
+```
+HTTP layer (routers, schemas)     ← Pydantic models, FastAPI deps
+    │
+    ▼
+Pipeline layer (context, pipeline) ← attrs domain models, orchestration
+    │
+    ▼
+Language layer (optimizer, postprocess, validator) ← libcst, tree-sitter
+    │
+    ▼
+Infrastructure (llm, db, observability) ← asyncpg, openai, anthropic
+```
+
+Dependency direction is strictly downward. Routers never import from each other. Language modules never import from routers. Infrastructure never imports from pipeline.
+
+### Endpoints (15 total)
+
+| Endpoint | Method | Module |
+|---|---|---|
+| `/healthcheck` | GET | `_app.py` |
+| `/ai/optimize` | POST | `optimize/_router.py` |
+| `/ai/optimize-line-profiler` | POST | `optimize/_line_profiler.py` |
+| `/ai/refinement` | POST | `refinement/_router.py` |
+| `/ai/code_repair` | POST | `repair/_router.py` |
+| `/ai/adaptive_optimize` | POST | `adaptive/_router.py` |
+| `/ai/testgen` | POST | `testgen/_router.py` |
+| `/ai/testgen_review` | POST | `testgen/_review_router.py` |
+| `/ai/testgen_repair` | POST | `testgen/_review_router.py` |
+| `/ai/rank` | POST | `ranking/_router.py` |
+| `/ai/explain` | POST | `explain/_router.py` |
+| `/ai/optimization_review` | POST | `review/_router.py` |
+| `/ai/rewrite_jit` | POST | `jit/_router.py` |
+| `/ai/workflow-gen` | POST | `workflow/_router.py` |
+| `/ai/log_features` | POST | `logging/_router.py` |
+
+### Key design decisions
+
+1. **Pydantic for API boundary only.** Request/response schemas are Pydantic v2 models. All internal domain models use attrs (frozen by default, define when mutation needed).
+
+2. **No ORM.** asyncpg with raw SQL. The service has 6 simple tables with no complex relations. Raw queries are faster, easier to test, no framework dependency.
+
+3. **FastAPI dependencies for middleware chain.** Auth, rate limiting, and usage tracking are composable `Depends()`, not middleware classes:
+   ```python
+   @router.post("/ai/optimize")
+   async def optimize(
+       request: OptimizeSchema,
+       user: AuthenticatedUser = Depends(require_auth),
+       _rate: None = Depends(check_rate_limit),
+       _usage: None = Depends(track_usage),
+   ) -> OptimizeResponse:
+   ```
+
+4. **LLM client as singleton with lifespan management.** Created in app lifespan, injected via dependency. Handles event loop safety (stale connection detection).
+
+5. **Prompt templates as files on disk.** Same `.md` files from Django version, loaded at startup, cached. Plain `.format()` substitution unless a template actually needs Jinja.
+
+6. **Fire-and-forget background tasks** via managed `asyncio.TaskGroup` (same pattern as Django `background.py`).
+
+7. **Language dispatch as a registry.** Register handlers at import time, look up by language string. Extensible without touching the router.
+
+### Business logic volume (from Django version audit)
+
+| Component | Logic Lines | Complexity | Port Priority |
+|---|---|---|---|
+| Postprocessing pipeline | ~520 | High | P0 — correctness-critical |
+| CST utilities | ~450 | High | P0 — correctness-critical |
+| V4A diff patching | ~380 | High | P0 — correctness-critical |
+| Optimizer context/prompts | ~350 | Medium | P0 — core pipeline |
+| Python optimizer pipeline | ~280 | Medium | P0 — core pipeline |
+| JS/TS optimizer | ~450 | Medium | P1 — after Python works |
+| Java optimizer | ~260 | Medium | P1 — after Python works |
+| Search & replace diff | ~194 | Low | P0 — used by repair |
+| LLM client | ~150 | Medium | P0 — everything depends on it |
+| Observability | ~65 | Low | P1 — nice to have early |
+| Language dispatch | ~27 | Thin | P0 — trivial |
+
+Total: ~3,130 lines of business logic to port.
+
+### Implementation order
+
+1. **Scaffold** — pyproject.toml, app factory, config, healthcheck, rules files
+2. **Auth layer** — key hashing, deps, rate limiting, usage tracking
+3. **LLM layer** — client abstraction, cost calculation, retry
+4. **DB layer** — asyncpg pool, queries, recording
+5. **Diff layer** — search/replace, V4A (pure logic, most testable)
+6. **Language layer** — validators, CST utils, postprocessing (pure logic)
+7. **Optimize endpoint** — context, pipeline, router (integration point)
+8. **Remaining endpoints** — repair, refinement, adaptive, testgen, rank, explain, review, jit, workflow, log_features
+9. **Integration tests** — full request→LLM→response with mocked providers
+
+Steps 2-6 are pure logic with no framework coupling — can be developed and tested in parallel.
+
+### Database tables
+
+| Table | Key Fields | Purpose |
+|---|---|---|
+| `cf_api_keys` | key (hashed), suffix, user_id, tier, organization_id | API key auth |
+| `users` | user_id (pk), github_username, email | User identity |
+| `organizations` | id (pk), name, github_org_id, privacy_mode | Org management |
+| `subscriptions` | user_id (unique), plan_type, optimizations_used, limits | Billing/quotas |
+| `llm_calls` | trace_id, call_type, model_name, tokens, cost, latency_ms | Observability |
+| `optimization_errors` | trace_id, error_type, severity, stack_trace | Error tracking |
+| `optimization_features` | trace_id, user_id, original_code, speedup_ratio | Feature logging |
+
+### External dependencies
+
+| Service | Purpose | Client |
+|---|---|---|
+| Azure OpenAI | LLM provider (GPT-5-mini) | `openai.AsyncAzureOpenAI` |
+| Anthropic Bedrock | LLM provider (Claude Sonnet 4.5) | `anthropic.AsyncAnthropicBedrock` |
+| PostgreSQL | Primary database | asyncpg |
+| PostHog | Product analytics | posthog |
+| Sentry | Error tracking | sentry-sdk |
+
+### Environment variables
+
+| Variable | Required | Purpose |
+|---|---|---|
+| `DATABASE_URL` | Yes | PostgreSQL connection string |
+| `AZURE_OPENAI_API_KEY` | Yes | Azure OpenAI auth |
+| `AWS_ACCESS_KEY_ID` | Yes | Anthropic Bedrock auth |
+| `AWS_SECRET_ACCESS_KEY` | Yes | Anthropic Bedrock auth |
+| `SECRET_KEY` | Yes | Token signing |
+| `ENVIRONMENT` | No | `production` or `development` |
+| `RATE_LIMIT_WINDOW_MS` | No | Rate limit window (default: 60000) |
+| `RATE_LIMIT_MAX` | No | Max requests per window (default: 40) |
+| `SENTRY_DSN` | No | Sentry error reporting |
+| `POSTHOG_API_KEY` | No | PostHog analytics |
+
+## Verification
+
+```bash
+uv run pytest packages/codeflash-api/tests/ -v    # Unit tests
+uv run ruff check src/ tests/                      # Lint
+uv run mypy src/                                   # Type check
+uv run uvicorn codeflash_api:app --reload          # Dev server
+```