# Task: Implement an LLM Response Parser for Optimization Candidates ## Context The codeflash-internal optimization pipeline receives LLM responses as markdown text containing code blocks with file path annotations. The context extraction system in `optimizer_context.py` parses these responses to extract optimized code candidates. The format uses annotated markdown code blocks like: ``` \`\`\`python:path/to/file.py # optimized code here \`\`\` ``` After extraction, candidates go through postprocessing: AST-based deduplication (using `ast.parse()` + `ast.dump()`) and equality checking against the original code. ## Task 1. Write a function `extract_code_blocks(llm_response: str) -> list[dict]` that: - Parses markdown code blocks from an LLM response string - Handles the file-path-annotated format: `` ```python:path/to/file.py `` - Returns a list of dicts, each with keys: `"code"` (str), `"file_path"` (str or None), `"language"` (str) - Handles both annotated (with file path) and plain code blocks 2. Write a function `deduplicate_candidates(candidates: list[str], original_code: str) -> list[str]` that: - Removes duplicate candidates using AST-based comparison (`ast.parse()` + `ast.dump()`) - Filters out candidates that are identical to the original code (equality check) - Returns only unique, non-original candidates - Handles `SyntaxError` gracefully (keep candidates that fail to parse, as they might use features beyond basic AST) 3. Write a function `validate_python_code(code: str) -> bool` that: - Checks if the code is syntactically valid Python - Returns True if `ast.parse()` succeeds, False otherwise ## Expected Outputs - A Python module with all three functions - `extract_code_blocks` should correctly parse multi-block responses - `deduplicate_candidates` should use AST normalization, not string comparison - The module should import only `ast` and `re` from the standard library