- Remove _line_profiler.py, observability/models.py, _optimizer.py, _rate_limit.py, _usage.py from tree (never created) - Add _background.py, _markdown.py, _xml.py that actually exist - Mark java/ and js_ts/ as stubs - Update endpoint count from 15 to 14, note log_features stub - Fix Depends() example to use Annotated[] pattern - Add deferred items: optimize-line-profiler, observability DB writes
12 KiB
codeflash-api
FastAPI AI service — ground-up rewrite of the Django aiservice from codeflash-internal/django/aiservice/.
Reference implementation
The Django version at ~/Desktop/work/cf_org/codeflash-internal/django/aiservice/ is the reference. Port logic faithfully from it — don't reimplement from scratch. When in doubt, read the Django source.
Architecture
Package structure
src/codeflash_api/
├── __init__.py
├── __main__.py # uvicorn entry: python -m codeflash_api
├── _app.py # FastAPI app factory, lifespan, middleware wiring
├── _config.py # Settings via pydantic-settings (env vars)
│
├── auth/ # Authentication & authorization
│ ├── _keys.py # API key hashing (SHA-384), lookup
│ ├── _deps.py # FastAPI dependencies: require_auth, check_rate_limit, track_usage
│ └── models.py # User, Organization, Subscription, APIKey (attrs)
│
├── llm/ # LLM provider abstraction
│ ├── _client.py # Async LLM client (OpenAI + Anthropic behind one interface)
│ ├── _models.py # Model definitions, cost rates, provider config (attrs)
│ ├── _cost.py # Cost calculation (cache-aware, per-provider)
│ └── _retry.py # Transient error classification, retry policy
│
├── db/ # Database layer
│ ├── _engine.py # asyncpg pool creation, lifespan management
│ ├── _queries.py # Raw SQL queries (no ORM)
│ └── models.py # Table schemas as attrs classes (not ORM models)
│
├── observability/ # LLM call recording, error tracking
│ ├── _background.py # Fire-and-forget background task management
│ └── _recording.py # LLM call and error recording (log-only, DB deferred)
│
├── optimize/ # Core optimization pipeline
│ ├── _router.py # POST /ai/optimize — language dispatch
│ ├── _pipeline.py # Parallel LLM orchestration, model distribution
│ ├── _context.py # Prompt assembly (system + user), runtime context
│ └── schemas.py # Request/response Pydantic models
│
├── repair/ # Code repair pipeline
│ ├── _router.py # POST /ai/code_repair
│ ├── _context.py # Repair prompt assembly, test diff formatting
│ └── schemas.py
│
├── refinement/ # Refinement pipeline
│ ├── _router.py # POST /ai/refinement
│ ├── _context.py # Refinement prompt assembly
│ └── schemas.py
│
├── adaptive/ # Adaptive optimization
│ ├── _router.py # POST /ai/adaptive_optimize
│ └── schemas.py
│
├── testgen/ # Test generation
│ ├── _router.py # POST /ai/testgen
│ ├── _review_router.py # POST /ai/testgen_review, /ai/testgen_repair
│ └── schemas.py
│
├── ranking/ # Candidate ranking
│ ├── _router.py # POST /ai/rank
│ └── schemas.py
│
├── explain/ # Explanation generation
│ ├── _router.py # POST /ai/explain
│ └── schemas.py
│
├── review/ # Optimization review
│ ├── _router.py # POST /ai/optimization_review
│ └── schemas.py
│
├── jit/ # JIT rewrite
│ ├── _router.py # POST /ai/rewrite_jit
│ └── schemas.py
│
├── workflow/ # Workflow generation
│ ├── _router.py # POST /ai/workflow-gen
│ └── schemas.py
│
├── logging/ # Feature logging
│ ├── _router.py # POST /ai/log_features
│ └── schemas.py
│
├── languages/ # Language-specific logic
│ ├── python/
│ │ ├── _postprocess.py # Dedup, validation, cleanup (~830 lines)
│ │ ├── _cst_utils.py # libcst utilities (~470 lines)
│ │ ├── _validator.py # Python syntax validation (libcst + ast)
│ │ ├── _markdown.py # Markdown code block extraction
│ │ ├── _xml.py # XML tag extraction for Anthropic responses
│ │ └── prompts/ # .md prompt templates
│ ├── java/ # Stub — P2, deferred
│ └── js_ts/ # Stub — P2, deferred
│
└── diff/ # Diff patch application
├── _base.py # Diff ABC, DiffMethod enum
├── _search_replace.py # SEARCH/REPLACE block parser (~194 lines)
└── _v4a.py # V4A unified diff with fuzzy matching (~380 lines)
Layer boundaries
HTTP layer (routers, schemas) ← Pydantic models, FastAPI deps
│
▼
Pipeline layer (context, pipeline) ← attrs domain models, orchestration
│
▼
Language layer (optimizer, postprocess, validator) ← libcst, tree-sitter
│
▼
Infrastructure (llm, db, observability) ← asyncpg, openai, anthropic
Dependency direction is strictly downward. Routers never import from each other. Language modules never import from routers. Infrastructure never imports from pipeline.
Endpoints (14 total)
| Endpoint | Method | Module |
|---|---|---|
/healthcheck |
GET | _app.py |
/ai/optimize |
POST | optimize/_router.py |
/ai/refinement |
POST | refinement/_router.py |
/ai/code_repair |
POST | repair/_router.py |
/ai/adaptive_optimize |
POST | adaptive/_router.py |
/ai/testgen |
POST | testgen/_router.py |
/ai/testgen_review |
POST | testgen/_review_router.py |
/ai/testgen_repair |
POST | testgen/_review_router.py |
/ai/rank |
POST | ranking/_router.py |
/ai/explain |
POST | explain/_router.py |
/ai/optimization_review |
POST | review/_router.py |
/ai/rewrite_jit |
POST | jit/_router.py |
/ai/workflow-gen |
POST | workflow/_router.py |
/ai/log_features |
POST | logging/_router.py (stub — DB upsert deferred) |
Key design decisions
-
Pydantic for API boundary only. Request/response schemas are Pydantic v2 models. All internal domain models use attrs (frozen by default, define when mutation needed).
-
No ORM. asyncpg with raw SQL. The service has 6 simple tables with no complex relations. Raw queries are faster, easier to test, no framework dependency.
-
FastAPI dependencies for middleware chain. Auth, rate limiting, and usage tracking are composable
Depends(), not middleware classes:@router.post("/ai/optimize") async def optimize( request: OptimizeSchema, user: Annotated[AuthenticatedUser, Depends(require_auth)], _rate: Annotated[None, Depends(check_rate_limit)], _usage: Annotated[None, Depends(track_usage)], ) -> OptimizeResponse: -
LLM client as singleton with lifespan management. Created in app lifespan, injected via dependency. Handles event loop safety (stale connection detection).
-
Prompt templates as files on disk. Same
.mdfiles from Django version, loaded at startup, cached. Plain.format()substitution unless a template actually needs Jinja. -
Fire-and-forget background tasks via managed
asyncio.TaskGroup(same pattern as Djangobackground.py). -
Language dispatch as a registry. Register handlers at import time, look up by language string. Extensible without touching the router.
Business logic volume (from Django version audit)
| Component | Logic Lines | Complexity | Port Priority |
|---|---|---|---|
| Postprocessing pipeline | ~520 | High | P0 — correctness-critical |
| CST utilities | ~450 | High | P0 — correctness-critical |
| V4A diff patching | ~380 | High | P0 — correctness-critical |
| Optimizer context/prompts | ~350 | Medium | P0 — core pipeline |
| Python optimizer pipeline | ~280 | Medium | P0 — core pipeline |
| JS/TS optimizer | ~450 | Medium | P1 — after Python works |
| Java optimizer | ~260 | Medium | P1 — after Python works |
| Search & replace diff | ~194 | Low | P0 — used by repair |
| LLM client | ~150 | Medium | P0 — everything depends on it |
| Observability | ~65 | Low | P1 — nice to have early |
| Language dispatch | ~27 | Thin | P0 — trivial |
Total: ~3,130 lines of business logic to port.
Implementation order
Scaffold — pyproject.toml, app factory, config, healthcheck, rules files✅Auth layer — key hashing, deps, rate limiting, usage tracking✅LLM layer — client abstraction, cost calculation, retry✅DB layer — asyncpg pool, queries, recording✅Diff layer — search/replace, V4A (pure logic, most testable)✅Language layer — validators, CST utils, postprocessing (pure logic)✅Optimize endpoint — context, pipeline, router (integration point)✅Remaining endpoints — repair, refinement, adaptive, testgen, rank, explain, review, jit, workflow, log_features✅Integration tests — full request→LLM→response with mocked providers✅
All 9 steps complete. 13/13 endpoints implemented (12 full, 1 stub: /ai/log_features awaiting DB wiring for optimization_features table).
Deferred work
/ai/optimize-line-profilerendpoint — Nearly identical to/ai/optimize, not yet implemented.- Testgen postprocessing — CST transformation pipeline (remove helpers, unused defs, cap loops/tensors, add imports, remove asserts).
- DB persistence for log_features — asyncpg upsert for optimization_features table.
- Observability DB persistence —
_recording.pyis log-only; asyncpg writes forllm_callsandoptimization_errorstables deferred. - JS/TS and Java language layers — P2, after Python pipeline is production-validated.
- CI pipeline — GitHub Actions for lint, test, type check, deploy.
Testgen instrumentation (behavior/perf AST transformations) moved to client side — the API returns raw validated code only.
Database tables
| Table | Key Fields | Purpose |
|---|---|---|
cf_api_keys |
key (hashed), suffix, user_id, tier, organization_id | API key auth |
users |
user_id (pk), github_username, email | User identity |
organizations |
id (pk), name, github_org_id, privacy_mode | Org management |
subscriptions |
user_id (unique), plan_type, optimizations_used, limits | Billing/quotas |
llm_calls |
trace_id, call_type, model_name, tokens, cost, latency_ms | Observability |
optimization_errors |
trace_id, error_type, severity, stack_trace | Error tracking |
optimization_features |
trace_id, user_id, original_code, speedup_ratio | Feature logging |
External dependencies
| Service | Purpose | Client |
|---|---|---|
| Azure OpenAI | LLM provider (GPT-5-mini) | openai.AsyncAzureOpenAI |
| Anthropic Bedrock | LLM provider (Claude Sonnet 4.5) | anthropic.AsyncAnthropicBedrock |
| PostgreSQL | Primary database | asyncpg |
| PostHog | Product analytics | posthog |
| Sentry | Error tracking | sentry-sdk |
Environment variables
| Variable | Required | Purpose |
|---|---|---|
DATABASE_URL |
Yes | PostgreSQL connection string |
AZURE_OPENAI_API_KEY |
Yes | Azure OpenAI auth |
AWS_ACCESS_KEY_ID |
Yes | Anthropic Bedrock auth |
AWS_SECRET_ACCESS_KEY |
Yes | Anthropic Bedrock auth |
SECRET_KEY |
Yes | Token signing |
ENVIRONMENT |
No | production or development |
RATE_LIMIT_WINDOW_MS |
No | Rate limit window (default: 60000) |
RATE_LIMIT_MAX |
No | Max requests per window (default: 40) |
SENTRY_DSN |
No | Sentry error reporting |
POSTHOG_API_KEY |
No | PostHog analytics |
Verification
uv run pytest packages/codeflash-api/tests/ -v # Unit tests
uv run ruff check src/ tests/ # Lint
uv run mypy src/ # Type check
uv run uvicorn codeflash_api:app --reload # Dev server