codeflash-agent/packages/codeflash-api/CLAUDE.md
Kevin Turcios fb76024cfb Fix CLAUDE.md accuracy: remove nonexistent files, update patterns
- Remove _line_profiler.py, observability/models.py, _optimizer.py,
  _rate_limit.py, _usage.py from tree (never created)
- Add _background.py, _markdown.py, _xml.py that actually exist
- Mark java/ and js_ts/ as stubs
- Update endpoint count from 15 to 14, note log_features stub
- Fix Depends() example to use Annotated[] pattern
- Add deferred items: optimize-line-profiler, observability DB writes
2026-04-22 23:40:01 -05:00

12 KiB

codeflash-api

FastAPI AI service — ground-up rewrite of the Django aiservice from codeflash-internal/django/aiservice/.

Reference implementation

The Django version at ~/Desktop/work/cf_org/codeflash-internal/django/aiservice/ is the reference. Port logic faithfully from it — don't reimplement from scratch. When in doubt, read the Django source.

Architecture

Package structure

src/codeflash_api/
├── __init__.py
├── __main__.py               # uvicorn entry: python -m codeflash_api
├── _app.py                   # FastAPI app factory, lifespan, middleware wiring
├── _config.py                # Settings via pydantic-settings (env vars)
│
├── auth/                     # Authentication & authorization
│   ├── _keys.py              # API key hashing (SHA-384), lookup
│   ├── _deps.py              # FastAPI dependencies: require_auth, check_rate_limit, track_usage
│   └── models.py             # User, Organization, Subscription, APIKey (attrs)
│
├── llm/                      # LLM provider abstraction
│   ├── _client.py            # Async LLM client (OpenAI + Anthropic behind one interface)
│   ├── _models.py            # Model definitions, cost rates, provider config (attrs)
│   ├── _cost.py              # Cost calculation (cache-aware, per-provider)
│   └── _retry.py             # Transient error classification, retry policy
│
├── db/                       # Database layer
│   ├── _engine.py            # asyncpg pool creation, lifespan management
│   ├── _queries.py           # Raw SQL queries (no ORM)
│   └── models.py             # Table schemas as attrs classes (not ORM models)
│
├── observability/            # LLM call recording, error tracking
│   ├── _background.py        # Fire-and-forget background task management
│   └── _recording.py         # LLM call and error recording (log-only, DB deferred)
│
├── optimize/                 # Core optimization pipeline
│   ├── _router.py            # POST /ai/optimize — language dispatch
│   ├── _pipeline.py          # Parallel LLM orchestration, model distribution
│   ├── _context.py           # Prompt assembly (system + user), runtime context
│   └── schemas.py            # Request/response Pydantic models
│
├── repair/                   # Code repair pipeline
│   ├── _router.py            # POST /ai/code_repair
│   ├── _context.py           # Repair prompt assembly, test diff formatting
│   └── schemas.py
│
├── refinement/               # Refinement pipeline
│   ├── _router.py            # POST /ai/refinement
│   ├── _context.py           # Refinement prompt assembly
│   └── schemas.py
│
├── adaptive/                 # Adaptive optimization
│   ├── _router.py            # POST /ai/adaptive_optimize
│   └── schemas.py
│
├── testgen/                  # Test generation
│   ├── _router.py            # POST /ai/testgen
│   ├── _review_router.py     # POST /ai/testgen_review, /ai/testgen_repair
│   └── schemas.py
│
├── ranking/                  # Candidate ranking
│   ├── _router.py            # POST /ai/rank
│   └── schemas.py
│
├── explain/                  # Explanation generation
│   ├── _router.py            # POST /ai/explain
│   └── schemas.py
│
├── review/                   # Optimization review
│   ├── _router.py            # POST /ai/optimization_review
│   └── schemas.py
│
├── jit/                      # JIT rewrite
│   ├── _router.py            # POST /ai/rewrite_jit
│   └── schemas.py
│
├── workflow/                 # Workflow generation
│   ├── _router.py            # POST /ai/workflow-gen
│   └── schemas.py
│
├── logging/                  # Feature logging
│   ├── _router.py            # POST /ai/log_features
│   └── schemas.py
│
├── languages/                # Language-specific logic
│   ├── python/
│   │   ├── _postprocess.py   # Dedup, validation, cleanup (~830 lines)
│   │   ├── _cst_utils.py     # libcst utilities (~470 lines)
│   │   ├── _validator.py     # Python syntax validation (libcst + ast)
│   │   ├── _markdown.py      # Markdown code block extraction
│   │   ├── _xml.py           # XML tag extraction for Anthropic responses
│   │   └── prompts/          # .md prompt templates
│   ├── java/                 # Stub — P2, deferred
│   └── js_ts/                # Stub — P2, deferred
│
└── diff/                     # Diff patch application
    ├── _base.py              # Diff ABC, DiffMethod enum
    ├── _search_replace.py    # SEARCH/REPLACE block parser (~194 lines)
    └── _v4a.py               # V4A unified diff with fuzzy matching (~380 lines)

Layer boundaries

HTTP layer (routers, schemas)     ← Pydantic models, FastAPI deps
    │
    ▼
Pipeline layer (context, pipeline) ← attrs domain models, orchestration
    │
    ▼
Language layer (optimizer, postprocess, validator) ← libcst, tree-sitter
    │
    ▼
Infrastructure (llm, db, observability) ← asyncpg, openai, anthropic

Dependency direction is strictly downward. Routers never import from each other. Language modules never import from routers. Infrastructure never imports from pipeline.

Endpoints (14 total)

Endpoint Method Module
/healthcheck GET _app.py
/ai/optimize POST optimize/_router.py
/ai/refinement POST refinement/_router.py
/ai/code_repair POST repair/_router.py
/ai/adaptive_optimize POST adaptive/_router.py
/ai/testgen POST testgen/_router.py
/ai/testgen_review POST testgen/_review_router.py
/ai/testgen_repair POST testgen/_review_router.py
/ai/rank POST ranking/_router.py
/ai/explain POST explain/_router.py
/ai/optimization_review POST review/_router.py
/ai/rewrite_jit POST jit/_router.py
/ai/workflow-gen POST workflow/_router.py
/ai/log_features POST logging/_router.py (stub — DB upsert deferred)

Key design decisions

  1. Pydantic for API boundary only. Request/response schemas are Pydantic v2 models. All internal domain models use attrs (frozen by default, define when mutation needed).

  2. No ORM. asyncpg with raw SQL. The service has 6 simple tables with no complex relations. Raw queries are faster, easier to test, no framework dependency.

  3. FastAPI dependencies for middleware chain. Auth, rate limiting, and usage tracking are composable Depends(), not middleware classes:

    @router.post("/ai/optimize")
    async def optimize(
        request: OptimizeSchema,
        user: Annotated[AuthenticatedUser, Depends(require_auth)],
        _rate: Annotated[None, Depends(check_rate_limit)],
        _usage: Annotated[None, Depends(track_usage)],
    ) -> OptimizeResponse:
    
  4. LLM client as singleton with lifespan management. Created in app lifespan, injected via dependency. Handles event loop safety (stale connection detection).

  5. Prompt templates as files on disk. Same .md files from Django version, loaded at startup, cached. Plain .format() substitution unless a template actually needs Jinja.

  6. Fire-and-forget background tasks via managed asyncio.TaskGroup (same pattern as Django background.py).

  7. Language dispatch as a registry. Register handlers at import time, look up by language string. Extensible without touching the router.

Business logic volume (from Django version audit)

Component Logic Lines Complexity Port Priority
Postprocessing pipeline ~520 High P0 — correctness-critical
CST utilities ~450 High P0 — correctness-critical
V4A diff patching ~380 High P0 — correctness-critical
Optimizer context/prompts ~350 Medium P0 — core pipeline
Python optimizer pipeline ~280 Medium P0 — core pipeline
JS/TS optimizer ~450 Medium P1 — after Python works
Java optimizer ~260 Medium P1 — after Python works
Search & replace diff ~194 Low P0 — used by repair
LLM client ~150 Medium P0 — everything depends on it
Observability ~65 Low P1 — nice to have early
Language dispatch ~27 Thin P0 — trivial

Total: ~3,130 lines of business logic to port.

Implementation order

  1. Scaffold — pyproject.toml, app factory, config, healthcheck, rules files
  2. Auth layer — key hashing, deps, rate limiting, usage tracking
  3. LLM layer — client abstraction, cost calculation, retry
  4. DB layer — asyncpg pool, queries, recording
  5. Diff layer — search/replace, V4A (pure logic, most testable)
  6. Language layer — validators, CST utils, postprocessing (pure logic)
  7. Optimize endpoint — context, pipeline, router (integration point)
  8. Remaining endpoints — repair, refinement, adaptive, testgen, rank, explain, review, jit, workflow, log_features
  9. Integration tests — full request→LLM→response with mocked providers

All 9 steps complete. 13/13 endpoints implemented (12 full, 1 stub: /ai/log_features awaiting DB wiring for optimization_features table).

Deferred work

  • /ai/optimize-line-profiler endpoint — Nearly identical to /ai/optimize, not yet implemented.
  • Testgen postprocessing — CST transformation pipeline (remove helpers, unused defs, cap loops/tensors, add imports, remove asserts).
  • DB persistence for log_features — asyncpg upsert for optimization_features table.
  • Observability DB persistence_recording.py is log-only; asyncpg writes for llm_calls and optimization_errors tables deferred.
  • JS/TS and Java language layers — P2, after Python pipeline is production-validated.
  • CI pipeline — GitHub Actions for lint, test, type check, deploy.

Testgen instrumentation (behavior/perf AST transformations) moved to client side — the API returns raw validated code only.

Database tables

Table Key Fields Purpose
cf_api_keys key (hashed), suffix, user_id, tier, organization_id API key auth
users user_id (pk), github_username, email User identity
organizations id (pk), name, github_org_id, privacy_mode Org management
subscriptions user_id (unique), plan_type, optimizations_used, limits Billing/quotas
llm_calls trace_id, call_type, model_name, tokens, cost, latency_ms Observability
optimization_errors trace_id, error_type, severity, stack_trace Error tracking
optimization_features trace_id, user_id, original_code, speedup_ratio Feature logging

External dependencies

Service Purpose Client
Azure OpenAI LLM provider (GPT-5-mini) openai.AsyncAzureOpenAI
Anthropic Bedrock LLM provider (Claude Sonnet 4.5) anthropic.AsyncAnthropicBedrock
PostgreSQL Primary database asyncpg
PostHog Product analytics posthog
Sentry Error tracking sentry-sdk

Environment variables

Variable Required Purpose
DATABASE_URL Yes PostgreSQL connection string
AZURE_OPENAI_API_KEY Yes Azure OpenAI auth
AWS_ACCESS_KEY_ID Yes Anthropic Bedrock auth
AWS_SECRET_ACCESS_KEY Yes Anthropic Bedrock auth
SECRET_KEY Yes Token signing
ENVIRONMENT No production or development
RATE_LIMIT_WINDOW_MS No Rate limit window (default: 60000)
RATE_LIMIT_MAX No Max requests per window (default: 40)
SENTRY_DSN No Sentry error reporting
POSTHOG_API_KEY No PostHog analytics

Verification

uv run pytest packages/codeflash-api/tests/ -v    # Unit tests
uv run ruff check src/ tests/                      # Lint
uv run mypy src/                                   # Type check
uv run uvicorn codeflash_api:app --reload          # Dev server