codeflash-internal/tiles/codeflash-internal-docs/evals/scenario-5/task.md
2026-02-14 22:25:30 -05:00

2.3 KiB

Task: Build an LLM Client Wrapper with Provider-Specific Handling

Context

The codeflash-internal aiservice uses a unified LLM abstraction in aiservice/llm.py. All LLM calls go through a single call_llm() function that handles both OpenAI (via Azure) and Anthropic (via Foundry) providers. Each provider has different client setup, message handling, and response parsing.

The test generation pipeline also uses LLM calls, adding framework detection for GPU sync in timing blocks and Jinja2-based prompt construction.

Task

  1. Write an LLM dataclass (using pydantic_dataclass) with fields:

    • name: str -- deployment name
    • max_tokens: int -- max context window
    • model_type: Literal["openai", "anthropic", "google"]
    • input_cost: float -- USD per 1M tokens
    • cached_input_cost: float -- USD per 1M cached tokens
    • output_cost: float -- USD per 1M tokens
  2. Define concrete model instances:

    • OpenAI_GPT_5_Mini with pricing: $0.25 input, $0.03 cached, $2.00 output
    • Anthropic_Claude_Sonnet_4_5 with pricing: $3.00 input, $0.00 cached, $15.00 output
  3. Write an async def call_llm() function that:

    • Accepts llm: LLM, messages: list[dict], call_type: str, trace_id: str, max_tokens: int, and optional user_id: str
    • For OpenAI: uses client.chat.completions.create(). If the model is GPT-5-mini, uses max_completion_tokens parameter; otherwise uses max_tokens
    • For Anthropic: extracts the system prompt from the messages list and passes it separately via the system= kwarg. Concatenates text blocks from the response
    • Records every call to the database via record_llm_call() in a finally block (including trace_id, call_type, model, cost, latency)
    • Returns an LLMResponse with content: str, usage: LLMUsage, and raw_response
  4. Write a detect_frameworks_from_code(code: str) -> set[str] function that:

    • Parses import statements to identify ML frameworks: PyTorch, TensorFlow, JAX
    • Detects both direct imports and aliases
    • Returns a set of framework names found

Expected Outputs

  • A Python module with the LLM dataclass, model instances, call_llm function, and framework detection
  • The call_llm function must handle both providers with their specific quirks
  • record_llm_call must be in a finally block for observability