Add codeflash-api architecture docs and project-scoped rules
CLAUDE.md with full package structure, layer boundaries, endpoint map, implementation order, business logic audit, and design decisions. Rules: architecture (layer boundaries, model conventions), testing (coverage requirements, mocking strategy), porting (reference files, what to port vs skip).
This commit is contained in:
parent
2221de0a71
commit
e34873fb82
4 changed files with 375 additions and 0 deletions
30
packages/codeflash-api/.claude/rules/architecture.md
Normal file
30
packages/codeflash-api/.claude/rules/architecture.md
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
# Architecture Rules
|
||||
|
||||
## Layer boundaries
|
||||
|
||||
Four layers, strict downward dependency:
|
||||
|
||||
1. **HTTP** (routers, schemas) — Pydantic models, FastAPI deps
|
||||
2. **Pipeline** (context, pipeline) — attrs domain models, orchestration
|
||||
3. **Language** (optimizer, postprocess, validator) — libcst, tree-sitter
|
||||
4. **Infrastructure** (llm, db, observability) — asyncpg, openai, anthropic
|
||||
|
||||
Never import upward. Routers never import from each other. Language modules never import from routers.
|
||||
|
||||
## Models
|
||||
|
||||
- **Pydantic** for request/response schemas only (files named `schemas.py`)
|
||||
- **attrs** for everything else — frozen by default, define when mutation needed
|
||||
- Never mix: a Pydantic model should not contain attrs classes or vice versa. Convert at the boundary.
|
||||
|
||||
## Dependencies
|
||||
|
||||
Use FastAPI `Depends()` for cross-cutting concerns (auth, rate limiting, usage tracking). Never use ASGI middleware for request-specific logic — middleware is for truly global concerns (CORS, healthcheck short-circuit).
|
||||
|
||||
## Async
|
||||
|
||||
Everything is async. No sync database calls, no sync LLM calls, no `run_in_executor` unless wrapping a library that has no async API (like libcst parsing).
|
||||
|
||||
## No ORM
|
||||
|
||||
Raw SQL via asyncpg. Queries live in `db/_queries.py`. Table schemas documented as attrs classes in `db/models.py` but never used as ORM models.
|
||||
45
packages/codeflash-api/.claude/rules/porting.md
Normal file
45
packages/codeflash-api/.claude/rules/porting.md
Normal file
|
|
@ -0,0 +1,45 @@
|
|||
# Porting Rules
|
||||
|
||||
## Source of truth
|
||||
|
||||
The Django aiservice at `~/Desktop/work/cf_org/codeflash-internal/django/aiservice/` is the reference implementation. When porting:
|
||||
|
||||
1. Read the Django source first
|
||||
2. Copy the logic, adapt the framework glue
|
||||
3. Don't "improve" the logic during the port — that introduces subtle divergence
|
||||
4. Improvements come in follow-up commits after the port passes tests
|
||||
|
||||
## Key files to port from (by priority)
|
||||
|
||||
### P0 — Core pipeline
|
||||
- `aiservice/llm.py` → `llm/_client.py` (dual-provider abstraction, cost tracking)
|
||||
- `core/languages/python/optimizer/optimizer.py` → `optimize/_pipeline.py`
|
||||
- `core/languages/python/optimizer/context_utils/optimizer_context.py` → `optimize/_context.py`
|
||||
- `core/languages/python/optimizer/postprocess.py` → `languages/python/_postprocess.py`
|
||||
- `core/languages/python/cst_utils.py` → `languages/python/_cst_utils.py`
|
||||
- `core/languages/python/optimizer/diff_patches_utils/` → `diff/`
|
||||
- `core/languages/python/code_repair/` → `repair/`
|
||||
|
||||
### P1 — Supporting endpoints
|
||||
- `core/languages/python/optimizer/refinement.py` → `refinement/`
|
||||
- `core/languages/python/adaptive_optimizer/` → `adaptive/`
|
||||
- `core/shared/ranker/ranker.py` → `ranking/`
|
||||
- `core/shared/testgen_router.py` → `testgen/`
|
||||
- `core/languages/python/explanations/` → `explain/`
|
||||
- `core/languages/python/optimization_review/` → `review/`
|
||||
|
||||
### P2 — Secondary languages
|
||||
- `core/languages/java/` → `languages/java/`
|
||||
- `core/languages/js_ts/` → `languages/js_ts/`
|
||||
|
||||
## What NOT to port
|
||||
|
||||
- Django ORM models → replaced by asyncpg queries
|
||||
- Django middleware classes → replaced by FastAPI dependencies
|
||||
- `manage.py`, `wsgi.py`, `asgi.py` → replaced by FastAPI app factory
|
||||
- `settings.py` → replaced by pydantic-settings `_config.py`
|
||||
- Django `urls.py` / NinjaAPI router mounting → replaced by FastAPI router includes
|
||||
|
||||
## Prompt templates
|
||||
|
||||
Copy `.md` prompt files verbatim. They are not Django-specific. Place them in `languages/<lang>/prompts/` mirroring the Django structure.
|
||||
48
packages/codeflash-api/.claude/rules/testing.md
Normal file
48
packages/codeflash-api/.claude/rules/testing.md
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
# Testing Rules
|
||||
|
||||
## Coverage requirement
|
||||
|
||||
Every module must have a corresponding test file. No exceptions. The Django version broke in E2E because key modules had zero tests — we don't repeat that mistake.
|
||||
|
||||
## What to test
|
||||
|
||||
For each module, test:
|
||||
1. **Happy path** — the normal case works
|
||||
2. **Error paths** — malformed input, missing fields, invalid values
|
||||
3. **Edge cases** — empty strings, None values, boundary conditions
|
||||
4. **Provider differences** — OpenAI vs Anthropic response formats differ
|
||||
|
||||
## LLM mocking
|
||||
|
||||
Never call real LLM APIs in tests. Use `unittest.mock.AsyncMock` or pytest fixtures that return canned responses. Test the parsing and orchestration, not the LLM.
|
||||
|
||||
## Database mocking
|
||||
|
||||
Unit tests mock the DB layer. Integration tests use a real test database (or testcontainers). Keep these separate — unit tests must be fast (<5 seconds total).
|
||||
|
||||
## Diff/patch tests
|
||||
|
||||
The diff application code (V4A, search/replace) is pure logic with no I/O. Test it exhaustively:
|
||||
- Malformed blocks, partial matches, empty input
|
||||
- Multi-hunk patches, fuzzy matching at all fuzz levels
|
||||
- EOF handling, context line matching
|
||||
- File path extraction from markers
|
||||
|
||||
## Validation tests
|
||||
|
||||
Test every language validator with:
|
||||
- Valid code (parses successfully)
|
||||
- Syntax errors (returns failure, doesn't crash)
|
||||
- Empty input, None input
|
||||
- Unicode, unusual whitespace
|
||||
|
||||
## FastAPI test client
|
||||
|
||||
Use `httpx.AsyncClient` with `app=app` for router tests. Don't start a real server.
|
||||
|
||||
```python
|
||||
@pytest.fixture
|
||||
async def client(app):
|
||||
async with httpx.AsyncClient(transport=httpx.ASGITransport(app=app), base_url="http://test") as c:
|
||||
yield c
|
||||
```
|
||||
252
packages/codeflash-api/CLAUDE.md
Normal file
252
packages/codeflash-api/CLAUDE.md
Normal file
|
|
@ -0,0 +1,252 @@
|
|||
# codeflash-api
|
||||
|
||||
FastAPI AI service — ground-up rewrite of the Django aiservice from `codeflash-internal/django/aiservice/`.
|
||||
|
||||
## Reference implementation
|
||||
|
||||
The Django version at `~/Desktop/work/cf_org/codeflash-internal/django/aiservice/` is the reference. Port logic faithfully from it — don't reimplement from scratch. When in doubt, read the Django source.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Package structure
|
||||
|
||||
```
|
||||
src/codeflash_api/
|
||||
├── __init__.py
|
||||
├── __main__.py # uvicorn entry: python -m codeflash_api
|
||||
├── _app.py # FastAPI app factory, lifespan, middleware wiring
|
||||
├── _config.py # Settings via pydantic-settings (env vars)
|
||||
│
|
||||
├── auth/ # Authentication & authorization
|
||||
│ ├── _keys.py # API key hashing (SHA-384), lookup
|
||||
│ ├── _deps.py # FastAPI dependencies: get_current_user, require_auth
|
||||
│ ├── _rate_limit.py # Per-user per-endpoint rate limiting
|
||||
│ ├── _usage.py # Subscription quota tracking
|
||||
│ └── models.py # User, Organization, Subscription, APIKey (attrs)
|
||||
│
|
||||
├── llm/ # LLM provider abstraction
|
||||
│ ├── _client.py # Async LLM client (OpenAI + Anthropic behind one interface)
|
||||
│ ├── _models.py # Model definitions, cost rates, provider config (attrs)
|
||||
│ ├── _cost.py # Cost calculation (cache-aware, per-provider)
|
||||
│ └── _retry.py # Transient error classification, retry policy
|
||||
│
|
||||
├── db/ # Database layer
|
||||
│ ├── _engine.py # asyncpg pool creation, lifespan management
|
||||
│ ├── _queries.py # Raw SQL queries (no ORM)
|
||||
│ └── models.py # Table schemas as attrs classes (not ORM models)
|
||||
│
|
||||
├── observability/ # LLM call recording, error tracking
|
||||
│ ├── _recording.py # Fire-and-forget async recording
|
||||
│ └── models.py # LLMCall, OptimizationError (attrs)
|
||||
│
|
||||
├── optimize/ # Core optimization pipeline
|
||||
│ ├── _router.py # POST /ai/optimize — language dispatch
|
||||
│ ├── _pipeline.py # Parallel LLM orchestration, model distribution
|
||||
│ ├── _context.py # Prompt assembly (system + user), runtime context
|
||||
│ ├── _line_profiler.py # POST /ai/optimize-line-profiler
|
||||
│ └── schemas.py # Request/response Pydantic models
|
||||
│
|
||||
├── repair/ # Code repair pipeline
|
||||
│ ├── _router.py # POST /ai/code_repair
|
||||
│ ├── _context.py # Repair prompt assembly, test diff formatting
|
||||
│ └── schemas.py
|
||||
│
|
||||
├── refinement/ # Refinement pipeline
|
||||
│ ├── _router.py # POST /ai/refinement
|
||||
│ ├── _context.py # Refinement prompt assembly
|
||||
│ └── schemas.py
|
||||
│
|
||||
├── adaptive/ # Adaptive optimization
|
||||
│ ├── _router.py # POST /ai/adaptive_optimize
|
||||
│ └── schemas.py
|
||||
│
|
||||
├── testgen/ # Test generation
|
||||
│ ├── _router.py # POST /ai/testgen
|
||||
│ ├── _review_router.py # POST /ai/testgen_review, /ai/testgen_repair
|
||||
│ └── schemas.py
|
||||
│
|
||||
├── ranking/ # Candidate ranking
|
||||
│ ├── _router.py # POST /ai/rank
|
||||
│ └── schemas.py
|
||||
│
|
||||
├── explain/ # Explanation generation
|
||||
│ ├── _router.py # POST /ai/explain
|
||||
│ └── schemas.py
|
||||
│
|
||||
├── review/ # Optimization review
|
||||
│ ├── _router.py # POST /ai/optimization_review
|
||||
│ └── schemas.py
|
||||
│
|
||||
├── jit/ # JIT rewrite
|
||||
│ ├── _router.py # POST /ai/rewrite_jit
|
||||
│ └── schemas.py
|
||||
│
|
||||
├── workflow/ # Workflow generation
|
||||
│ ├── _router.py # POST /ai/workflow-gen
|
||||
│ └── schemas.py
|
||||
│
|
||||
├── logging/ # Feature logging
|
||||
│ ├── _router.py # POST /ai/log_features
|
||||
│ └── schemas.py
|
||||
│
|
||||
├── languages/ # Language-specific logic
|
||||
│ ├── python/
|
||||
│ │ ├── _optimizer.py # Python optimization handler
|
||||
│ │ ├── _postprocess.py # Dedup, validation, cleanup (~520 lines)
|
||||
│ │ ├── _cst_utils.py # libcst utilities (~450 lines)
|
||||
│ │ ├── _validator.py # Python syntax validation (libcst + ast)
|
||||
│ │ └── prompts/ # .md prompt templates
|
||||
│ ├── java/
|
||||
│ │ ├── _optimizer.py
|
||||
│ │ ├── _validator.py # tree-sitter-java
|
||||
│ │ └── prompts/
|
||||
│ └── js_ts/
|
||||
│ ├── _optimizer.py
|
||||
│ ├── _validator.py # tree-sitter-javascript/typescript
|
||||
│ └── prompts/
|
||||
│
|
||||
└── diff/ # Diff patch application
|
||||
├── _base.py # Diff ABC, DiffMethod enum
|
||||
├── _search_replace.py # SEARCH/REPLACE block parser (~194 lines)
|
||||
└── _v4a.py # V4A unified diff with fuzzy matching (~380 lines)
|
||||
```
|
||||
|
||||
### Layer boundaries
|
||||
|
||||
```
|
||||
HTTP layer (routers, schemas) ← Pydantic models, FastAPI deps
|
||||
│
|
||||
▼
|
||||
Pipeline layer (context, pipeline) ← attrs domain models, orchestration
|
||||
│
|
||||
▼
|
||||
Language layer (optimizer, postprocess, validator) ← libcst, tree-sitter
|
||||
│
|
||||
▼
|
||||
Infrastructure (llm, db, observability) ← asyncpg, openai, anthropic
|
||||
```
|
||||
|
||||
Dependency direction is strictly downward. Routers never import from each other. Language modules never import from routers. Infrastructure never imports from pipeline.
|
||||
|
||||
### Endpoints (15 total)
|
||||
|
||||
| Endpoint | Method | Module |
|
||||
|---|---|---|
|
||||
| `/healthcheck` | GET | `_app.py` |
|
||||
| `/ai/optimize` | POST | `optimize/_router.py` |
|
||||
| `/ai/optimize-line-profiler` | POST | `optimize/_line_profiler.py` |
|
||||
| `/ai/refinement` | POST | `refinement/_router.py` |
|
||||
| `/ai/code_repair` | POST | `repair/_router.py` |
|
||||
| `/ai/adaptive_optimize` | POST | `adaptive/_router.py` |
|
||||
| `/ai/testgen` | POST | `testgen/_router.py` |
|
||||
| `/ai/testgen_review` | POST | `testgen/_review_router.py` |
|
||||
| `/ai/testgen_repair` | POST | `testgen/_review_router.py` |
|
||||
| `/ai/rank` | POST | `ranking/_router.py` |
|
||||
| `/ai/explain` | POST | `explain/_router.py` |
|
||||
| `/ai/optimization_review` | POST | `review/_router.py` |
|
||||
| `/ai/rewrite_jit` | POST | `jit/_router.py` |
|
||||
| `/ai/workflow-gen` | POST | `workflow/_router.py` |
|
||||
| `/ai/log_features` | POST | `logging/_router.py` |
|
||||
|
||||
### Key design decisions
|
||||
|
||||
1. **Pydantic for API boundary only.** Request/response schemas are Pydantic v2 models. All internal domain models use attrs (frozen by default, define when mutation needed).
|
||||
|
||||
2. **No ORM.** asyncpg with raw SQL. The service has 6 simple tables with no complex relations. Raw queries are faster, easier to test, no framework dependency.
|
||||
|
||||
3. **FastAPI dependencies for middleware chain.** Auth, rate limiting, and usage tracking are composable `Depends()`, not middleware classes:
|
||||
```python
|
||||
@router.post("/ai/optimize")
|
||||
async def optimize(
|
||||
request: OptimizeSchema,
|
||||
user: AuthenticatedUser = Depends(require_auth),
|
||||
_rate: None = Depends(check_rate_limit),
|
||||
_usage: None = Depends(track_usage),
|
||||
) -> OptimizeResponse:
|
||||
```
|
||||
|
||||
4. **LLM client as singleton with lifespan management.** Created in app lifespan, injected via dependency. Handles event loop safety (stale connection detection).
|
||||
|
||||
5. **Prompt templates as files on disk.** Same `.md` files from Django version, loaded at startup, cached. Plain `.format()` substitution unless a template actually needs Jinja.
|
||||
|
||||
6. **Fire-and-forget background tasks** via managed `asyncio.TaskGroup` (same pattern as Django `background.py`).
|
||||
|
||||
7. **Language dispatch as a registry.** Register handlers at import time, look up by language string. Extensible without touching the router.
|
||||
|
||||
### Business logic volume (from Django version audit)
|
||||
|
||||
| Component | Logic Lines | Complexity | Port Priority |
|
||||
|---|---|---|---|
|
||||
| Postprocessing pipeline | ~520 | High | P0 — correctness-critical |
|
||||
| CST utilities | ~450 | High | P0 — correctness-critical |
|
||||
| V4A diff patching | ~380 | High | P0 — correctness-critical |
|
||||
| Optimizer context/prompts | ~350 | Medium | P0 — core pipeline |
|
||||
| Python optimizer pipeline | ~280 | Medium | P0 — core pipeline |
|
||||
| JS/TS optimizer | ~450 | Medium | P1 — after Python works |
|
||||
| Java optimizer | ~260 | Medium | P1 — after Python works |
|
||||
| Search & replace diff | ~194 | Low | P0 — used by repair |
|
||||
| LLM client | ~150 | Medium | P0 — everything depends on it |
|
||||
| Observability | ~65 | Low | P1 — nice to have early |
|
||||
| Language dispatch | ~27 | Thin | P0 — trivial |
|
||||
|
||||
Total: ~3,130 lines of business logic to port.
|
||||
|
||||
### Implementation order
|
||||
|
||||
1. **Scaffold** — pyproject.toml, app factory, config, healthcheck, rules files
|
||||
2. **Auth layer** — key hashing, deps, rate limiting, usage tracking
|
||||
3. **LLM layer** — client abstraction, cost calculation, retry
|
||||
4. **DB layer** — asyncpg pool, queries, recording
|
||||
5. **Diff layer** — search/replace, V4A (pure logic, most testable)
|
||||
6. **Language layer** — validators, CST utils, postprocessing (pure logic)
|
||||
7. **Optimize endpoint** — context, pipeline, router (integration point)
|
||||
8. **Remaining endpoints** — repair, refinement, adaptive, testgen, rank, explain, review, jit, workflow, log_features
|
||||
9. **Integration tests** — full request→LLM→response with mocked providers
|
||||
|
||||
Steps 2-6 are pure logic with no framework coupling — can be developed and tested in parallel.
|
||||
|
||||
### Database tables
|
||||
|
||||
| Table | Key Fields | Purpose |
|
||||
|---|---|---|
|
||||
| `cf_api_keys` | key (hashed), suffix, user_id, tier, organization_id | API key auth |
|
||||
| `users` | user_id (pk), github_username, email | User identity |
|
||||
| `organizations` | id (pk), name, github_org_id, privacy_mode | Org management |
|
||||
| `subscriptions` | user_id (unique), plan_type, optimizations_used, limits | Billing/quotas |
|
||||
| `llm_calls` | trace_id, call_type, model_name, tokens, cost, latency_ms | Observability |
|
||||
| `optimization_errors` | trace_id, error_type, severity, stack_trace | Error tracking |
|
||||
| `optimization_features` | trace_id, user_id, original_code, speedup_ratio | Feature logging |
|
||||
|
||||
### External dependencies
|
||||
|
||||
| Service | Purpose | Client |
|
||||
|---|---|---|
|
||||
| Azure OpenAI | LLM provider (GPT-5-mini) | `openai.AsyncAzureOpenAI` |
|
||||
| Anthropic Bedrock | LLM provider (Claude Sonnet 4.5) | `anthropic.AsyncAnthropicBedrock` |
|
||||
| PostgreSQL | Primary database | asyncpg |
|
||||
| PostHog | Product analytics | posthog |
|
||||
| Sentry | Error tracking | sentry-sdk |
|
||||
|
||||
### Environment variables
|
||||
|
||||
| Variable | Required | Purpose |
|
||||
|---|---|---|
|
||||
| `DATABASE_URL` | Yes | PostgreSQL connection string |
|
||||
| `AZURE_OPENAI_API_KEY` | Yes | Azure OpenAI auth |
|
||||
| `AWS_ACCESS_KEY_ID` | Yes | Anthropic Bedrock auth |
|
||||
| `AWS_SECRET_ACCESS_KEY` | Yes | Anthropic Bedrock auth |
|
||||
| `SECRET_KEY` | Yes | Token signing |
|
||||
| `ENVIRONMENT` | No | `production` or `development` |
|
||||
| `RATE_LIMIT_WINDOW_MS` | No | Rate limit window (default: 60000) |
|
||||
| `RATE_LIMIT_MAX` | No | Max requests per window (default: 40) |
|
||||
| `SENTRY_DSN` | No | Sentry error reporting |
|
||||
| `POSTHOG_API_KEY` | No | PostHog analytics |
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
uv run pytest packages/codeflash-api/tests/ -v # Unit tests
|
||||
uv run ruff check src/ tests/ # Lint
|
||||
uv run mypy src/ # Type check
|
||||
uv run uvicorn codeflash_api:app --reload # Dev server
|
||||
```
|
||||
Loading…
Reference in a new issue