Add codeflash-api architecture docs and project-scoped rules

CLAUDE.md with full package structure, layer boundaries, endpoint map,
implementation order, business logic audit, and design decisions.

Rules: architecture (layer boundaries, model conventions), testing
(coverage requirements, mocking strategy), porting (reference files,
what to port vs skip).
This commit is contained in:
Kevin Turcios 2026-04-21 21:16:32 -05:00
parent 2221de0a71
commit e34873fb82
4 changed files with 375 additions and 0 deletions

View file

@ -0,0 +1,30 @@
# Architecture Rules
## Layer boundaries
Four layers, strict downward dependency:
1. **HTTP** (routers, schemas) — Pydantic models, FastAPI deps
2. **Pipeline** (context, pipeline) — attrs domain models, orchestration
3. **Language** (optimizer, postprocess, validator) — libcst, tree-sitter
4. **Infrastructure** (llm, db, observability) — asyncpg, openai, anthropic
Never import upward. Routers never import from each other. Language modules never import from routers.
## Models
- **Pydantic** for request/response schemas only (files named `schemas.py`)
- **attrs** for everything else — frozen by default, define when mutation needed
- Never mix: a Pydantic model should not contain attrs classes or vice versa. Convert at the boundary.
## Dependencies
Use FastAPI `Depends()` for cross-cutting concerns (auth, rate limiting, usage tracking). Never use ASGI middleware for request-specific logic — middleware is for truly global concerns (CORS, healthcheck short-circuit).
## Async
Everything is async. No sync database calls, no sync LLM calls, no `run_in_executor` unless wrapping a library that has no async API (like libcst parsing).
## No ORM
Raw SQL via asyncpg. Queries live in `db/_queries.py`. Table schemas documented as attrs classes in `db/models.py` but never used as ORM models.

View file

@ -0,0 +1,45 @@
# Porting Rules
## Source of truth
The Django aiservice at `~/Desktop/work/cf_org/codeflash-internal/django/aiservice/` is the reference implementation. When porting:
1. Read the Django source first
2. Copy the logic, adapt the framework glue
3. Don't "improve" the logic during the port — that introduces subtle divergence
4. Improvements come in follow-up commits after the port passes tests
## Key files to port from (by priority)
### P0 — Core pipeline
- `aiservice/llm.py``llm/_client.py` (dual-provider abstraction, cost tracking)
- `core/languages/python/optimizer/optimizer.py``optimize/_pipeline.py`
- `core/languages/python/optimizer/context_utils/optimizer_context.py``optimize/_context.py`
- `core/languages/python/optimizer/postprocess.py``languages/python/_postprocess.py`
- `core/languages/python/cst_utils.py``languages/python/_cst_utils.py`
- `core/languages/python/optimizer/diff_patches_utils/``diff/`
- `core/languages/python/code_repair/``repair/`
### P1 — Supporting endpoints
- `core/languages/python/optimizer/refinement.py``refinement/`
- `core/languages/python/adaptive_optimizer/``adaptive/`
- `core/shared/ranker/ranker.py``ranking/`
- `core/shared/testgen_router.py``testgen/`
- `core/languages/python/explanations/``explain/`
- `core/languages/python/optimization_review/``review/`
### P2 — Secondary languages
- `core/languages/java/``languages/java/`
- `core/languages/js_ts/``languages/js_ts/`
## What NOT to port
- Django ORM models → replaced by asyncpg queries
- Django middleware classes → replaced by FastAPI dependencies
- `manage.py`, `wsgi.py`, `asgi.py` → replaced by FastAPI app factory
- `settings.py` → replaced by pydantic-settings `_config.py`
- Django `urls.py` / NinjaAPI router mounting → replaced by FastAPI router includes
## Prompt templates
Copy `.md` prompt files verbatim. They are not Django-specific. Place them in `languages/<lang>/prompts/` mirroring the Django structure.

View file

@ -0,0 +1,48 @@
# Testing Rules
## Coverage requirement
Every module must have a corresponding test file. No exceptions. The Django version broke in E2E because key modules had zero tests — we don't repeat that mistake.
## What to test
For each module, test:
1. **Happy path** — the normal case works
2. **Error paths** — malformed input, missing fields, invalid values
3. **Edge cases** — empty strings, None values, boundary conditions
4. **Provider differences** — OpenAI vs Anthropic response formats differ
## LLM mocking
Never call real LLM APIs in tests. Use `unittest.mock.AsyncMock` or pytest fixtures that return canned responses. Test the parsing and orchestration, not the LLM.
## Database mocking
Unit tests mock the DB layer. Integration tests use a real test database (or testcontainers). Keep these separate — unit tests must be fast (<5 seconds total).
## Diff/patch tests
The diff application code (V4A, search/replace) is pure logic with no I/O. Test it exhaustively:
- Malformed blocks, partial matches, empty input
- Multi-hunk patches, fuzzy matching at all fuzz levels
- EOF handling, context line matching
- File path extraction from markers
## Validation tests
Test every language validator with:
- Valid code (parses successfully)
- Syntax errors (returns failure, doesn't crash)
- Empty input, None input
- Unicode, unusual whitespace
## FastAPI test client
Use `httpx.AsyncClient` with `app=app` for router tests. Don't start a real server.
```python
@pytest.fixture
async def client(app):
async with httpx.AsyncClient(transport=httpx.ASGITransport(app=app), base_url="http://test") as c:
yield c
```

View file

@ -0,0 +1,252 @@
# codeflash-api
FastAPI AI service — ground-up rewrite of the Django aiservice from `codeflash-internal/django/aiservice/`.
## Reference implementation
The Django version at `~/Desktop/work/cf_org/codeflash-internal/django/aiservice/` is the reference. Port logic faithfully from it — don't reimplement from scratch. When in doubt, read the Django source.
## Architecture
### Package structure
```
src/codeflash_api/
├── __init__.py
├── __main__.py # uvicorn entry: python -m codeflash_api
├── _app.py # FastAPI app factory, lifespan, middleware wiring
├── _config.py # Settings via pydantic-settings (env vars)
├── auth/ # Authentication & authorization
│ ├── _keys.py # API key hashing (SHA-384), lookup
│ ├── _deps.py # FastAPI dependencies: get_current_user, require_auth
│ ├── _rate_limit.py # Per-user per-endpoint rate limiting
│ ├── _usage.py # Subscription quota tracking
│ └── models.py # User, Organization, Subscription, APIKey (attrs)
├── llm/ # LLM provider abstraction
│ ├── _client.py # Async LLM client (OpenAI + Anthropic behind one interface)
│ ├── _models.py # Model definitions, cost rates, provider config (attrs)
│ ├── _cost.py # Cost calculation (cache-aware, per-provider)
│ └── _retry.py # Transient error classification, retry policy
├── db/ # Database layer
│ ├── _engine.py # asyncpg pool creation, lifespan management
│ ├── _queries.py # Raw SQL queries (no ORM)
│ └── models.py # Table schemas as attrs classes (not ORM models)
├── observability/ # LLM call recording, error tracking
│ ├── _recording.py # Fire-and-forget async recording
│ └── models.py # LLMCall, OptimizationError (attrs)
├── optimize/ # Core optimization pipeline
│ ├── _router.py # POST /ai/optimize — language dispatch
│ ├── _pipeline.py # Parallel LLM orchestration, model distribution
│ ├── _context.py # Prompt assembly (system + user), runtime context
│ ├── _line_profiler.py # POST /ai/optimize-line-profiler
│ └── schemas.py # Request/response Pydantic models
├── repair/ # Code repair pipeline
│ ├── _router.py # POST /ai/code_repair
│ ├── _context.py # Repair prompt assembly, test diff formatting
│ └── schemas.py
├── refinement/ # Refinement pipeline
│ ├── _router.py # POST /ai/refinement
│ ├── _context.py # Refinement prompt assembly
│ └── schemas.py
├── adaptive/ # Adaptive optimization
│ ├── _router.py # POST /ai/adaptive_optimize
│ └── schemas.py
├── testgen/ # Test generation
│ ├── _router.py # POST /ai/testgen
│ ├── _review_router.py # POST /ai/testgen_review, /ai/testgen_repair
│ └── schemas.py
├── ranking/ # Candidate ranking
│ ├── _router.py # POST /ai/rank
│ └── schemas.py
├── explain/ # Explanation generation
│ ├── _router.py # POST /ai/explain
│ └── schemas.py
├── review/ # Optimization review
│ ├── _router.py # POST /ai/optimization_review
│ └── schemas.py
├── jit/ # JIT rewrite
│ ├── _router.py # POST /ai/rewrite_jit
│ └── schemas.py
├── workflow/ # Workflow generation
│ ├── _router.py # POST /ai/workflow-gen
│ └── schemas.py
├── logging/ # Feature logging
│ ├── _router.py # POST /ai/log_features
│ └── schemas.py
├── languages/ # Language-specific logic
│ ├── python/
│ │ ├── _optimizer.py # Python optimization handler
│ │ ├── _postprocess.py # Dedup, validation, cleanup (~520 lines)
│ │ ├── _cst_utils.py # libcst utilities (~450 lines)
│ │ ├── _validator.py # Python syntax validation (libcst + ast)
│ │ └── prompts/ # .md prompt templates
│ ├── java/
│ │ ├── _optimizer.py
│ │ ├── _validator.py # tree-sitter-java
│ │ └── prompts/
│ └── js_ts/
│ ├── _optimizer.py
│ ├── _validator.py # tree-sitter-javascript/typescript
│ └── prompts/
└── diff/ # Diff patch application
├── _base.py # Diff ABC, DiffMethod enum
├── _search_replace.py # SEARCH/REPLACE block parser (~194 lines)
└── _v4a.py # V4A unified diff with fuzzy matching (~380 lines)
```
### Layer boundaries
```
HTTP layer (routers, schemas) ← Pydantic models, FastAPI deps
Pipeline layer (context, pipeline) ← attrs domain models, orchestration
Language layer (optimizer, postprocess, validator) ← libcst, tree-sitter
Infrastructure (llm, db, observability) ← asyncpg, openai, anthropic
```
Dependency direction is strictly downward. Routers never import from each other. Language modules never import from routers. Infrastructure never imports from pipeline.
### Endpoints (15 total)
| Endpoint | Method | Module |
|---|---|---|
| `/healthcheck` | GET | `_app.py` |
| `/ai/optimize` | POST | `optimize/_router.py` |
| `/ai/optimize-line-profiler` | POST | `optimize/_line_profiler.py` |
| `/ai/refinement` | POST | `refinement/_router.py` |
| `/ai/code_repair` | POST | `repair/_router.py` |
| `/ai/adaptive_optimize` | POST | `adaptive/_router.py` |
| `/ai/testgen` | POST | `testgen/_router.py` |
| `/ai/testgen_review` | POST | `testgen/_review_router.py` |
| `/ai/testgen_repair` | POST | `testgen/_review_router.py` |
| `/ai/rank` | POST | `ranking/_router.py` |
| `/ai/explain` | POST | `explain/_router.py` |
| `/ai/optimization_review` | POST | `review/_router.py` |
| `/ai/rewrite_jit` | POST | `jit/_router.py` |
| `/ai/workflow-gen` | POST | `workflow/_router.py` |
| `/ai/log_features` | POST | `logging/_router.py` |
### Key design decisions
1. **Pydantic for API boundary only.** Request/response schemas are Pydantic v2 models. All internal domain models use attrs (frozen by default, define when mutation needed).
2. **No ORM.** asyncpg with raw SQL. The service has 6 simple tables with no complex relations. Raw queries are faster, easier to test, no framework dependency.
3. **FastAPI dependencies for middleware chain.** Auth, rate limiting, and usage tracking are composable `Depends()`, not middleware classes:
```python
@router.post("/ai/optimize")
async def optimize(
request: OptimizeSchema,
user: AuthenticatedUser = Depends(require_auth),
_rate: None = Depends(check_rate_limit),
_usage: None = Depends(track_usage),
) -> OptimizeResponse:
```
4. **LLM client as singleton with lifespan management.** Created in app lifespan, injected via dependency. Handles event loop safety (stale connection detection).
5. **Prompt templates as files on disk.** Same `.md` files from Django version, loaded at startup, cached. Plain `.format()` substitution unless a template actually needs Jinja.
6. **Fire-and-forget background tasks** via managed `asyncio.TaskGroup` (same pattern as Django `background.py`).
7. **Language dispatch as a registry.** Register handlers at import time, look up by language string. Extensible without touching the router.
### Business logic volume (from Django version audit)
| Component | Logic Lines | Complexity | Port Priority |
|---|---|---|---|
| Postprocessing pipeline | ~520 | High | P0 — correctness-critical |
| CST utilities | ~450 | High | P0 — correctness-critical |
| V4A diff patching | ~380 | High | P0 — correctness-critical |
| Optimizer context/prompts | ~350 | Medium | P0 — core pipeline |
| Python optimizer pipeline | ~280 | Medium | P0 — core pipeline |
| JS/TS optimizer | ~450 | Medium | P1 — after Python works |
| Java optimizer | ~260 | Medium | P1 — after Python works |
| Search & replace diff | ~194 | Low | P0 — used by repair |
| LLM client | ~150 | Medium | P0 — everything depends on it |
| Observability | ~65 | Low | P1 — nice to have early |
| Language dispatch | ~27 | Thin | P0 — trivial |
Total: ~3,130 lines of business logic to port.
### Implementation order
1. **Scaffold** — pyproject.toml, app factory, config, healthcheck, rules files
2. **Auth layer** — key hashing, deps, rate limiting, usage tracking
3. **LLM layer** — client abstraction, cost calculation, retry
4. **DB layer** — asyncpg pool, queries, recording
5. **Diff layer** — search/replace, V4A (pure logic, most testable)
6. **Language layer** — validators, CST utils, postprocessing (pure logic)
7. **Optimize endpoint** — context, pipeline, router (integration point)
8. **Remaining endpoints** — repair, refinement, adaptive, testgen, rank, explain, review, jit, workflow, log_features
9. **Integration tests** — full request→LLM→response with mocked providers
Steps 2-6 are pure logic with no framework coupling — can be developed and tested in parallel.
### Database tables
| Table | Key Fields | Purpose |
|---|---|---|
| `cf_api_keys` | key (hashed), suffix, user_id, tier, organization_id | API key auth |
| `users` | user_id (pk), github_username, email | User identity |
| `organizations` | id (pk), name, github_org_id, privacy_mode | Org management |
| `subscriptions` | user_id (unique), plan_type, optimizations_used, limits | Billing/quotas |
| `llm_calls` | trace_id, call_type, model_name, tokens, cost, latency_ms | Observability |
| `optimization_errors` | trace_id, error_type, severity, stack_trace | Error tracking |
| `optimization_features` | trace_id, user_id, original_code, speedup_ratio | Feature logging |
### External dependencies
| Service | Purpose | Client |
|---|---|---|
| Azure OpenAI | LLM provider (GPT-5-mini) | `openai.AsyncAzureOpenAI` |
| Anthropic Bedrock | LLM provider (Claude Sonnet 4.5) | `anthropic.AsyncAnthropicBedrock` |
| PostgreSQL | Primary database | asyncpg |
| PostHog | Product analytics | posthog |
| Sentry | Error tracking | sentry-sdk |
### Environment variables
| Variable | Required | Purpose |
|---|---|---|
| `DATABASE_URL` | Yes | PostgreSQL connection string |
| `AZURE_OPENAI_API_KEY` | Yes | Azure OpenAI auth |
| `AWS_ACCESS_KEY_ID` | Yes | Anthropic Bedrock auth |
| `AWS_SECRET_ACCESS_KEY` | Yes | Anthropic Bedrock auth |
| `SECRET_KEY` | Yes | Token signing |
| `ENVIRONMENT` | No | `production` or `development` |
| `RATE_LIMIT_WINDOW_MS` | No | Rate limit window (default: 60000) |
| `RATE_LIMIT_MAX` | No | Max requests per window (default: 40) |
| `SENTRY_DSN` | No | Sentry error reporting |
| `POSTHOG_API_KEY` | No | PostHog analytics |
## Verification
```bash
uv run pytest packages/codeflash-api/tests/ -v # Unit tests
uv run ruff check src/ tests/ # Lint
uv run mypy src/ # Type check
uv run uvicorn codeflash_api:app --reload # Dev server
```