codeflash-internal/tiles/codeflash-internal-docs/evals/scenario-3/task.md
2026-02-14 22:25:30 -05:00

1.9 KiB

Task: Implement an LLM Response Parser for Optimization Candidates

Context

The codeflash-internal optimization pipeline receives LLM responses as markdown text containing code blocks with file path annotations. The context extraction system in optimizer_context.py parses these responses to extract optimized code candidates. The format uses annotated markdown code blocks like:

\`\`\`python:path/to/file.py
# optimized code here
\`\`\`

After extraction, candidates go through postprocessing: AST-based deduplication (using ast.parse() + ast.dump()) and equality checking against the original code.

Task

  1. Write a function extract_code_blocks(llm_response: str) -> list[dict] that:

    • Parses markdown code blocks from an LLM response string
    • Handles the file-path-annotated format: ```python:path/to/file.py
    • Returns a list of dicts, each with keys: "code" (str), "file_path" (str or None), "language" (str)
    • Handles both annotated (with file path) and plain code blocks
  2. Write a function deduplicate_candidates(candidates: list[str], original_code: str) -> list[str] that:

    • Removes duplicate candidates using AST-based comparison (ast.parse() + ast.dump())
    • Filters out candidates that are identical to the original code (equality check)
    • Returns only unique, non-original candidates
    • Handles SyntaxError gracefully (keep candidates that fail to parse, as they might use features beyond basic AST)
  3. Write a function validate_python_code(code: str) -> bool that:

    • Checks if the code is syntactically valid Python
    • Returns True if ast.parse() succeeds, False otherwise

Expected Outputs

  • A Python module with all three functions
  • extract_code_blocks should correctly parse multi-block responses
  • deduplicate_candidates should use AST normalization, not string comparison
  • The module should import only ast and re from the standard library