codeflash-internal/tiles/codeflash-internal-docs/evals/scenario-1/task.md
2026-02-14 22:25:30 -05:00

2.2 KiB

Task: Implement a Model Distribution Calculator

Context

You are working on the codeflash-internal aiservice backend. The optimization pipeline distributes LLM calls across OpenAI and Anthropic models in parallel. The distribution logic lives in core/shared/optimizer_config.py.

You need to write a Python function that replicates the model distribution logic, and a second function that calculates the total estimated LLM cost for an optimization run given usage data.

Task

  1. Write a function get_model_distribution(n_candidates: int, max_calls: int) -> list[tuple[str, int]] that:

    • Takes the number of requested candidates and the maximum allowed parallel calls
    • Computes total = min(n_candidates, max_calls)
    • Splits between OpenAI and Anthropic using the formula: claude_calls = (total - 1) // 2, gpt_calls = total - claude_calls
    • Returns a list of (model_name, call_count) tuples, using "openai" and "anthropic" as model names
  2. Write a function calculate_optimization_cost(input_tokens: int, output_tokens: int, cached_input_tokens: int, provider: str) -> float that:

    • Computes the cost in USD given token counts
    • For the "openai" provider: cached_input_tokens is a subset of input_tokens, so non-cached = input_tokens - cached_input_tokens. Use GPT-5-mini pricing: $0.25 input, $0.03 cached input, $2.00 output per 1M tokens.
    • For the "anthropic" provider: cached_input_tokens is additive to input_tokens (they are separate). Use Claude Sonnet 4.5 pricing: $3.00 input, $15.00 output per 1M tokens (no cached discount).
  3. Write a function estimate_full_run_cost(n_candidates: int, avg_input_tokens: int, avg_output_tokens: int, avg_cached_tokens: int) -> float that:

    • Uses get_model_distribution with MAX_OPTIMIZER_CALLS = 6
    • For each provider's call count, calculates cost using calculate_optimization_cost
    • Returns total estimated cost

Expected Outputs

  • A Python module with all three functions
  • The distribution for n_candidates=5, max_calls=6 should produce 3 OpenAI + 2 Anthropic calls
  • The distribution for n_candidates=6, max_calls=6 should produce 4 OpenAI + 2 Anthropic calls