mirror of
https://github.com/codeflash-ai/codeflash-internal.git
synced 2026-05-04 18:25:18 +00:00
1.9 KiB
1.9 KiB
Task: Implement an LLM Response Parser for Optimization Candidates
Context
The codeflash-internal optimization pipeline receives LLM responses as markdown text containing code blocks with file path annotations. The context extraction system in optimizer_context.py parses these responses to extract optimized code candidates. The format uses annotated markdown code blocks like:
\`\`\`python:path/to/file.py
# optimized code here
\`\`\`
After extraction, candidates go through postprocessing: AST-based deduplication (using ast.parse() + ast.dump()) and equality checking against the original code.
Task
-
Write a function
extract_code_blocks(llm_response: str) -> list[dict]that:- Parses markdown code blocks from an LLM response string
- Handles the file-path-annotated format:
```python:path/to/file.py - Returns a list of dicts, each with keys:
"code"(str),"file_path"(str or None),"language"(str) - Handles both annotated (with file path) and plain code blocks
-
Write a function
deduplicate_candidates(candidates: list[str], original_code: str) -> list[str]that:- Removes duplicate candidates using AST-based comparison (
ast.parse()+ast.dump()) - Filters out candidates that are identical to the original code (equality check)
- Returns only unique, non-original candidates
- Handles
SyntaxErrorgracefully (keep candidates that fail to parse, as they might use features beyond basic AST)
- Removes duplicate candidates using AST-based comparison (
-
Write a function
validate_python_code(code: str) -> boolthat:- Checks if the code is syntactically valid Python
- Returns True if
ast.parse()succeeds, False otherwise
Expected Outputs
- A Python module with all three functions
extract_code_blocksshould correctly parse multi-block responsesdeduplicate_candidatesshould use AST normalization, not string comparison- The module should import only
astandrefrom the standard library