Kevin Turcios fb76024cfb Fix CLAUDE.md accuracy: remove nonexistent files, update patterns

- Remove _line_profiler.py, observability/models.py, _optimizer.py,
  _rate_limit.py, _usage.py from tree (never created)
- Add _background.py, _markdown.py, _xml.py that actually exist
- Mark java/ and js_ts/ as stubs
- Update endpoint count from 15 to 14, note log_features stub
- Fix Depends() example to use Annotated[] pattern
- Add deferred items: optimize-line-profiler, observability DB writes

2026-04-22 23:40:01 -05:00

12 KiB

Raw Blame History

codeflash-api

FastAPI AI service — ground-up rewrite of the Django aiservice from codeflash-internal/django/aiservice/.

Reference implementation

The Django version at ~/Desktop/work/cf_org/codeflash-internal/django/aiservice/ is the reference. Port logic faithfully from it — don't reimplement from scratch. When in doubt, read the Django source.

Architecture

Package structure

src/codeflash_api/
├── __init__.py
├── __main__.py               # uvicorn entry: python -m codeflash_api
├── _app.py                   # FastAPI app factory, lifespan, middleware wiring
├── _config.py                # Settings via pydantic-settings (env vars)
│
├── auth/                     # Authentication & authorization
│   ├── _keys.py              # API key hashing (SHA-384), lookup
│   ├── _deps.py              # FastAPI dependencies: require_auth, check_rate_limit, track_usage
│   └── models.py             # User, Organization, Subscription, APIKey (attrs)
│
├── llm/                      # LLM provider abstraction
│   ├── _client.py            # Async LLM client (OpenAI + Anthropic behind one interface)
│   ├── _models.py            # Model definitions, cost rates, provider config (attrs)
│   ├── _cost.py              # Cost calculation (cache-aware, per-provider)
│   └── _retry.py             # Transient error classification, retry policy
│
├── db/                       # Database layer
│   ├── _engine.py            # asyncpg pool creation, lifespan management
│   ├── _queries.py           # Raw SQL queries (no ORM)
│   └── models.py             # Table schemas as attrs classes (not ORM models)
│
├── observability/            # LLM call recording, error tracking
│   ├── _background.py        # Fire-and-forget background task management
│   └── _recording.py         # LLM call and error recording (log-only, DB deferred)
│
├── optimize/                 # Core optimization pipeline
│   ├── _router.py            # POST /ai/optimize — language dispatch
│   ├── _pipeline.py          # Parallel LLM orchestration, model distribution
│   ├── _context.py           # Prompt assembly (system + user), runtime context
│   └── schemas.py            # Request/response Pydantic models
│
├── repair/                   # Code repair pipeline
│   ├── _router.py            # POST /ai/code_repair
│   ├── _context.py           # Repair prompt assembly, test diff formatting
│   └── schemas.py
│
├── refinement/               # Refinement pipeline
│   ├── _router.py            # POST /ai/refinement
│   ├── _context.py           # Refinement prompt assembly
│   └── schemas.py
│
├── adaptive/                 # Adaptive optimization
│   ├── _router.py            # POST /ai/adaptive_optimize
│   └── schemas.py
│
├── testgen/                  # Test generation
│   ├── _router.py            # POST /ai/testgen
│   ├── _review_router.py     # POST /ai/testgen_review, /ai/testgen_repair
│   └── schemas.py
│
├── ranking/                  # Candidate ranking
│   ├── _router.py            # POST /ai/rank
│   └── schemas.py
│
├── explain/                  # Explanation generation
│   ├── _router.py            # POST /ai/explain
│   └── schemas.py
│
├── review/                   # Optimization review
│   ├── _router.py            # POST /ai/optimization_review
│   └── schemas.py
│
├── jit/                      # JIT rewrite
│   ├── _router.py            # POST /ai/rewrite_jit
│   └── schemas.py
│
├── workflow/                 # Workflow generation
│   ├── _router.py            # POST /ai/workflow-gen
│   └── schemas.py
│
├── logging/                  # Feature logging
│   ├── _router.py            # POST /ai/log_features
│   └── schemas.py
│
├── languages/                # Language-specific logic
│   ├── python/
│   │   ├── _postprocess.py   # Dedup, validation, cleanup (~830 lines)
│   │   ├── _cst_utils.py     # libcst utilities (~470 lines)
│   │   ├── _validator.py     # Python syntax validation (libcst + ast)
│   │   ├── _markdown.py      # Markdown code block extraction
│   │   ├── _xml.py           # XML tag extraction for Anthropic responses
│   │   └── prompts/          # .md prompt templates
│   ├── java/                 # Stub — P2, deferred
│   └── js_ts/                # Stub — P2, deferred
│
└── diff/                     # Diff patch application
    ├── _base.py              # Diff ABC, DiffMethod enum
    ├── _search_replace.py    # SEARCH/REPLACE block parser (~194 lines)
    └── _v4a.py               # V4A unified diff with fuzzy matching (~380 lines)

Layer boundaries

HTTP layer (routers, schemas)     ← Pydantic models, FastAPI deps
    │
    ▼
Pipeline layer (context, pipeline) ← attrs domain models, orchestration
    │
    ▼
Language layer (optimizer, postprocess, validator) ← libcst, tree-sitter
    │
    ▼
Infrastructure (llm, db, observability) ← asyncpg, openai, anthropic

Dependency direction is strictly downward. Routers never import from each other. Language modules never import from routers. Infrastructure never imports from pipeline.

Endpoints (14 total)

Endpoint	Method	Module
`/healthcheck`	GET	`_app.py`
`/ai/optimize`	POST	`optimize/_router.py`
`/ai/refinement`	POST	`refinement/_router.py`
`/ai/code_repair`	POST	`repair/_router.py`
`/ai/adaptive_optimize`	POST	`adaptive/_router.py`
`/ai/testgen`	POST	`testgen/_router.py`
`/ai/testgen_review`	POST	`testgen/_review_router.py`
`/ai/testgen_repair`	POST	`testgen/_review_router.py`
`/ai/rank`	POST	`ranking/_router.py`
`/ai/explain`	POST	`explain/_router.py`
`/ai/optimization_review`	POST	`review/_router.py`
`/ai/rewrite_jit`	POST	`jit/_router.py`
`/ai/workflow-gen`	POST	`workflow/_router.py`
`/ai/log_features`	POST	`logging/_router.py` (stub — DB upsert deferred)

Key design decisions

Pydantic for API boundary only. Request/response schemas are Pydantic v2 models. All internal domain models use attrs (frozen by default, define when mutation needed).
No ORM. asyncpg with raw SQL. The service has 6 simple tables with no complex relations. Raw queries are faster, easier to test, no framework dependency.

FastAPI dependencies for middleware chain. Auth, rate limiting, and usage tracking are composable Depends(), not middleware classes:

@router.post("/ai/optimize")
async def optimize(
    request: OptimizeSchema,
    user: Annotated[AuthenticatedUser, Depends(require_auth)],
    _rate: Annotated[None, Depends(check_rate_limit)],
    _usage: Annotated[None, Depends(track_usage)],
) -> OptimizeResponse:

LLM client as singleton with lifespan management. Created in app lifespan, injected via dependency. Handles event loop safety (stale connection detection).
Prompt templates as files on disk. Same .md files from Django version, loaded at startup, cached. Plain .format() substitution unless a template actually needs Jinja.
Fire-and-forget background tasks via managed asyncio.TaskGroup (same pattern as Django background.py).
Language dispatch as a registry. Register handlers at import time, look up by language string. Extensible without touching the router.

Business logic volume (from Django version audit)

Component	Logic Lines	Complexity	Port Priority
Postprocessing pipeline	~520	High	P0 — correctness-critical
CST utilities	~450	High	P0 — correctness-critical
V4A diff patching	~380	High	P0 — correctness-critical
Optimizer context/prompts	~350	Medium	P0 — core pipeline
Python optimizer pipeline	~280	Medium	P0 — core pipeline
JS/TS optimizer	~450	Medium	P1 — after Python works
Java optimizer	~260	Medium	P1 — after Python works
Search & replace diff	~194	Low	P0 — used by repair
LLM client	~150	Medium	P0 — everything depends on it
Observability	~65	Low	P1 — nice to have early
Language dispatch	~27	Thin	P0 — trivial

Total: ~3,130 lines of business logic to port.

Implementation order

Scaffold — pyproject.toml, app factory, config, healthcheck, rules files ✅
Auth layer — key hashing, deps, rate limiting, usage tracking ✅
LLM layer — client abstraction, cost calculation, retry ✅
DB layer — asyncpg pool, queries, recording ✅
Diff layer — search/replace, V4A (pure logic, most testable) ✅
Language layer — validators, CST utils, postprocessing (pure logic) ✅
Optimize endpoint — context, pipeline, router (integration point) ✅
Remaining endpoints — repair, refinement, adaptive, testgen, rank, explain, review, jit, workflow, log_features ✅
Integration tests — full request→LLM→response with mocked providers ✅

All 9 steps complete. 13/13 endpoints implemented (12 full, 1 stub: /ai/log_features awaiting DB wiring for optimization_features table).

Deferred work

/ai/optimize-line-profiler endpoint — Nearly identical to /ai/optimize, not yet implemented.
Testgen postprocessing — CST transformation pipeline (remove helpers, unused defs, cap loops/tensors, add imports, remove asserts).
DB persistence for log_features — asyncpg upsert for optimization_features table.
Observability DB persistence — _recording.py is log-only; asyncpg writes for llm_calls and optimization_errors tables deferred.
JS/TS and Java language layers — P2, after Python pipeline is production-validated.
CI pipeline — GitHub Actions for lint, test, type check, deploy.

Testgen instrumentation (behavior/perf AST transformations) moved to client side — the API returns raw validated code only.

Database tables

Table	Key Fields	Purpose
`cf_api_keys`	key (hashed), suffix, user_id, tier, organization_id	API key auth
`users`	user_id (pk), github_username, email	User identity
`organizations`	id (pk), name, github_org_id, privacy_mode	Org management
`subscriptions`	user_id (unique), plan_type, optimizations_used, limits	Billing/quotas
`llm_calls`	trace_id, call_type, model_name, tokens, cost, latency_ms	Observability
`optimization_errors`	trace_id, error_type, severity, stack_trace	Error tracking
`optimization_features`	trace_id, user_id, original_code, speedup_ratio	Feature logging

External dependencies

Service	Purpose	Client
Azure OpenAI	LLM provider (GPT-5-mini)	`openai.AsyncAzureOpenAI`
Anthropic Bedrock	LLM provider (Claude Sonnet 4.5)	`anthropic.AsyncAnthropicBedrock`
PostgreSQL	Primary database	asyncpg
PostHog	Product analytics	posthog
Sentry	Error tracking	sentry-sdk

Environment variables

Variable	Required	Purpose
`DATABASE_URL`	Yes	PostgreSQL connection string
`AZURE_OPENAI_API_KEY`	Yes	Azure OpenAI auth
`AWS_ACCESS_KEY_ID`	Yes	Anthropic Bedrock auth
`AWS_SECRET_ACCESS_KEY`	Yes	Anthropic Bedrock auth
`SECRET_KEY`	Yes	Token signing
`ENVIRONMENT`	No	`production` or `development`
`RATE_LIMIT_WINDOW_MS`	No	Rate limit window (default: 60000)
`RATE_LIMIT_MAX`	No	Max requests per window (default: 40)
`SENTRY_DSN`	No	Sentry error reporting
`POSTHOG_API_KEY`	No	PostHog analytics

Verification

uv run pytest packages/codeflash-api/tests/ -v    # Unit tests
uv run ruff check src/ tests/                      # Lint
uv run mypy src/                                   # Type check
uv run uvicorn codeflash_api:app --reload          # Dev server

12 KiB Raw Blame History